Embrace Relationships with Neo4J, R & Java

Introduction

Graphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular graph database that can be used to make recommendations, get social, find paths, uncover fraud, manage networks, and so on. A graph database can store any kind of data using a Nodes (graph data records), Relationships (connect nodes), and Properties (named data values).

A graph database can be used for connected data which is otherwise not possible with either relational or other NOSQL databases as they lack relationships and multiple depth traversals. Graph Databases Embrace Relationships as they naturally form Paths. Querying or traversing the graph involves following Paths. Because of the fundamentally path-oriented nature of the data model, the majority of path-based graph database operations are highly aligned with the way in which the data is laid out, making them extremely efficient.

Use Case

This use case is based on modified version of StackOverflow dataset that shows network of programming languages, questions that refers to these programming languages, users who asked and answered these questions, and how these nodes are connected with relationships to find deeper insights in Neo4J Graph Database which is otherwise not possible with common relation database or other NoSQL databases.

What we want to do:

Prerequisites
Download StackOverflow Dataset
Data Manipulation with R
Create Nodes & Relationships file with Java
Create GraphDB with BatchImporter
Visualize Graph with Neo4J

Solution

Prerequisites

Download and Install Neo4j: We will be using Neo4j 2.x version and installing it on Windows is very easy. Follow the instructions on at the below link to download and install.

Note: Neo4j 2.x requires JDK 1.7 and above.

http://www.neo4j.org/download/windows

Download and Install RStudio: We will be using R to perform some data manipulation on the StackOverflow dataset which is available in RData format and this includes filtering, altering, dropping columns, and others. This is done to show the power of R with respect to data manipulation and the same can be done in other programming languages as well. Download the open source edition of Rstudio from the below link.

http://www.rstudio.com/products/rstudio/#Desk

Download StackOverflow Dataset

Download Dataset: This use case is based on modified version of StackOverflow dataset which is rather old and available in both CSV and RData format. Follow the below links to download the dataset. The first link contains the details about various fields and the second link is to download RData

http://www.ics.uci.edu/~duboisc/StackOverflow

http://www.ics.uci.edu/~duboisc/StackOverflow/answers.Rdata

Understanding Dataset:

We will be mostly interested in the following fields which will be used to create nodes and relationships in Neo4j.

qid:	Unique question id
i:	User id of questioner
qs:	Score of the question
tags:	a comma-separated list of the tags associated with the question that refers to programming languages
qvc:	Number of views of this question
aid:	Unique answer id
j:	User id of answer
as:	Score of the answer

Data Manipulation with R

We will reshape the dataset to fit to our needs and appreciate the power of data manipulation with R. The actual RData contains around 250 K rows but this use case will perform the following manipulation to keep it interesting and small.

Open RStudio and Set Working Directory: Open RStudio and set the working directory to where the RData file was downloaded.

Load and Perform Data Manipulation:

//Load answers.Rdata that was downloaded 

load(“answers.Rdata”)

//The data is available in “data” object and a quick can be done with head

head(data)

//Load answers.Rdata that was downloaded

load(“answers.Rdata”)

//The data is available in “data” object and a quick can be done with head

head(data)

//Load stringr library to perform some String manipulation

require(stringr)

//Create a new column Match and assign True/False based on whether the tags contain only specific language.
//For this use case, we are interested only in subset of programming languages.

data$Match <- str_detect(string = data$tags, pattern = “(java|mysql|linux|python|django|php|jquery)”)

//Create a new column length that contains number of words in tags column by using splitting.
//sapply function will perform the function str_split recursively for each row

data$length <- sapply(str_split(data$tags, “,”), length)

//The data object now contains 2 new columns: Match and length. Match column will have TRUE if the tags column contains
//one of the programming language patterns that we are interested in. The length column will have number of words delimited
//by comma

head(data)

//Load stringr library to perform some String manipulation

require(stringr)

//Create a new column Match and assign True/False based on whether the tags contain only specific language.

//For this use case, we are interested only in subset of programming languages.

//Create a new column length that contains number of words in tags column by using splitting.

//sapply function will perform the function str_split recursively for each row

data$length <– sapply(str_split(data$tags, “,”), length)

//The data object now contains 2 new columns: Match and length. Match column will have TRUE if the tags column contains

//one of the programming language patterns that we are interested in. The length column will have number of words delimited

//by comma

head(data)

//Find number of rows in the data object
nrow(data) //This will show 263540 rows

//Subset the data object where Match=True, length=1, question and answer score are greater than zero
//Store the result in a newdata object

newdata <- subset(data, (Match == “TRUE” & length == 1 & qs > 0 & as > 0))

//the row count is significantly went down to 1668
nrow(newdata)

//The top 5 row sample shows that the tags column has only one programming language associated
head(newdata)

//Find number of rows in the data object

nrow(data) //This will show 263540 rows

//Subset the data object where Match=True, length=1, question and answer score are greater than zero

//Store the result in a newdata object

newdata <– subset(data, (Match == “TRUE” & length == 1 & qs > 0 & as > 0))

//the row count is significantly went down to 1668

nrow(newdata)

//The top 5 row sample shows that the tags column has only one programming language associated

head(newdata)

//Create a drop column list(qt, at, Match, and length) and drop from the newdata object that are not needed anymore

drops <- c(“qt”, “at”, “Match”, “length”)

//The new data frame finaldata object doesn’t contain the drops column list
finaldata <- newdata[, !(names(newdata) %in% drops)]
head(finaldata)

//Create a drop column list(qt, at, Match, and length) and drop from the newdata object that are not needed anymore

drops <– c(“qt”, “at”, “Match”, “length”)

//The new data frame finaldata object doesn’t contain the drops column list

finaldata <– newdata[, !(names(newdata) %in% drops)]

head(finaldata)

//Order the finaldata object by question id
finaldata <- finaldata[order(finaldata$qid),]

//Write the finaldata object to a CSV file that will be used to create nodes and relationships
write.csv(finaldata, “finaldata.csv”,sep=”,”,row.names=FALSE)

//Order the finaldata object by question id

finaldata <– finaldata[order(finaldata$qid),]

//Write the finaldata object to a CSV file that will be used to create nodes and relationships

write.csv(finaldata, “finaldata.csv”,sep=“,”,row.names=FALSE)

Note: Ignore the warning message

Create Nodes and Relationship file with Java

We will write a Java program that takes the finadata.csv generated from the above R program and create multiple node files and a single relationship file that contains relations between the nodes. Our nodes and relationship structure is as follows:

Nodes: question_nodes, answer_nodes, user_nodes, lang_nodes
Relationships: The following are the relationships

//One question refers to one programming language
Question REFERS Language

//One question can have multiple answers
Question HAS_ANSWER Answer

//One question asked by one user
Question ASKED_BY User

//One answer answered by one user
Answer ANSWERED_BY User

//One question refers to one programming language

Question REFERS Language

//One question can have multiple answers

Question HAS_ANSWER Answer

//One question asked by one user

Question ASKED_BY User

//One answer answered by one user

Answer ANSWERED_BY User

Details about Java Program: This Java program is self explanatory and simply creates nodes and relationship files in CSV format as needed by the Neo4j Batch Importer program. Few things about the Java program to keep in mind

The format of Nodes file is as follows:

//id is the actual id, string is the datatype of the id, and users indicate the name of the index that we want to create in Neo4J. This file should contain somename:datatype:index_name and may contain more attributes of the nodes with tab delimited. This is the format that Neo4J Batch Importer expects

Id:string:users      attribute1      attribute2
qid_123456         4 (views)      10 (score)

//id is the actual id, string is the datatype of the id, and users indicate the name of the index that we want to create in Neo4J. This file should contain somename:datatype:index_name and may contain more attributes of the nodes with tab delimited. This is the format that Neo4J Batch Importer expects

Id:string:users attribute1 attribute2

qid_123456 4 (views) 10 (score)

The format of Relationship file is as follows:

//ids of the nodes and type of the relationship between them. So, the question qid_797771 is ASKED_BY user uid_94691

id:string:users    id:string:users      type
qid_797771         uid_94691             ASKED_BY
qid_887301         javascript            REFERS
qid_607386         aid_608425            HAS_ANSWER
qid_809735         uid_88631             ASKED_BY
qid_954376         uid_117795            ASKED_BY

//ids of the nodes and type of the relationship between them. So, the question qid_797771 is ASKED_BY user uid_94691

id:string:users id:string:users type

qid_797771 uid_94691 ASKED_BY

qid_887301 javascript REFERS

qid_607386 aid_608425 HAS_ANSWER

qid_809735 uid_88631 ASKED_BY

qid_954376 uid_117795 ASKED_BY

lang_nodes is manually created as it is static. All other nodes and relationship file is programmatically generated

//lang_nodes.csv

id:string:users name
java            Java
mysql           MySQL
linux           Linux
python          Python
django          Django
php             PHP
jquery          JQuery
javascript      Javascript
cakephp         CakePHP

//lang_nodes.csv

id:string:users name

java Java

mysql MySQL

linux Linux

python Python

django Django

php PHP

jquery JQuery

javascript Javascript

cakephp CakePHP

- finaldata.csv is renamed to sodata.csv (optional)
- The dataset doesn’t come with names of questioners and answerers. So, we have downloaded some fictional names and associated them with the userid. This will make more sense when we view them in Neo4j graphical interface. A fictional name file for around 1500 names were created from http://homepage.net/name_generator/ and stored as “random_names.txt”.
  
  Sample of random_names.txt:
  
  Edward MacDonald Nicholas Arnold Faith Lambert Peter White Trevor Campbell
  
  1
  
  2
  
  3
  
  4
  
  5
  
  Edward MacDonald
  
  Nicholas Arnold
  
  Faith Lambert
  
  Peter White
  
  Trevor Campbell

Java Program to Create Nodes & Relationships:

Note:The below program has dependency only on OpenCSV library that can be downloaded from http://sourceforge.net/projects/opencsv/

package com.treselle.soagrapher;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

import au.com.bytecode.opencsv.CSVReader;

public class NodeRelationCreator {

    private static final String QUESTION_NODE_FILE = “question_nodes.csv”;
    private static final String USER_NODE_FILE = “user_nodes.csv”;
    private static final String ANSWER_NODE_FILE = “answer_nodes.csv”;
    private static final String RELATIONS_FILE = “rels.csv”;
    private static final String INPUT_FILE = “sodata.csv”;
    private static final String RANDOM_NAME_FILE = “random_names.txt”;

    //stores question id as the key and views, score as map values
    private static Map<String, Map<String, String>> questions = new HashMap<String, Map<String, String>>();
    //stores unique userids of both questioner and answerer
    private static Set<String> users = new HashSet<String>();
    //stores random names from the file
    private static List<String> randomNames = new ArrayList<String>();
    //stores answerid as key and score as the map values
    private static Map<String, Map<String, String>> answers = new HashMap<String, Map<String, String>>();
    //stores various relations between nodes. The key is two nodes delimited by :: and the value is relation type
    private static Map<String, String> relsMap = new HashMap<String, String>();

    private void readFromCSV() throws Exception{
        //Read the CSV with tab delimited and skip first row
        CSVReader csvReader = new CSVReader(new FileReader(INPUT_FILE),’,’,’\”‘,1);
        String[] rows = null;

        String lang = null;
        String questionId = null;
        String question_user = null;
        String question_score = null;
        String question_views = null;
        String answerId = null;
        String answer_user = null;
        String answer_score = null;
        Map<String, String> questionAttrs = null;
        Map<String, String> answerAttrs = null;

        while((rows = csvReader.readNext()) != null) {

            questionAttrs = new HashMap<String, String>();
            answerAttrs = new HashMap<String, String>();

            questionId = rows[0];
            question_user = rows[1];
            question_score = rows[2];
            lang = rows[3];
            question_views = rows[4];
            answerId = rows[6];
            answer_user = rows[7];
            answer_score = rows[8];

            questionAttrs.put(“views”,question_views);
            questionAttrs.put(“score”,question_score);
            questions.put(“qid_”+questionId, questionAttrs);

            answerAttrs.put(“score”, answer_score);
            answers.put(“aid_”+answerId, answerAttrs);

            users.add(“uid_”+question_user);
            users.add(“uid_”+answer_user);

            relsMap.put(“qid_”+questionId+”::”+”aid_”+answerId, “HAS_ANSWER”);
            relsMap.put(“qid_”+questionId+”::”+”uid_”+question_user, “ASKED_BY”);
            relsMap.put(“aid_”+answerId+”::”+”uid_”+answer_user, “ANSWERED_BY”);
            relsMap.put(“qid_”+questionId+”::”+lang, “REFERS”);
        }

        this.writeQuestionNodesFile();
        this.writeAwnsersNodesFile();
        this.writeUsersNodesFile();
        this.writeRelationsFile();
        csvReader.close();
    }

    private void writeQuestionNodesFile(){
        try{
            FileWriter fos = new FileWriter(QUESTION_NODE_FILE);
            PrintWriter dos = new PrintWriter(fos);
            dos.println(“id:string:users\tname\tviews\tscore”);

            for (Entry<String, Map<String, String>> entry : questions.entrySet()){
                dos.print(entry.getKey());
                Map<String, String> valueMap = entry.getValue();
                dos.print(“\t”+entry.getKey());
                dos.print(“\t”+valueMap.get(“views”));
                dos.print(“\t”+valueMap.get(“score”));
                dos.println();
            }

            dos.close();
            fos.close();

        }catch (IOException e) {
            System.err.println(“Error writeQuestionNodesFile File”);
        }
    }

    private void writeAwnsersNodesFile(){
        try{
            FileWriter fos = new FileWriter(ANSWER_NODE_FILE);
            PrintWriter dos = new PrintWriter(fos);
            dos.println(“id:string:users\tname\tscore”);

            for (Entry<String, Map<String, String>> entry : answers.entrySet()){
                dos.print(entry.getKey());
                Map<String, String> valueMap = entry.getValue();
                dos.print(“\t”+entry.getKey());
                dos.print(“\t”+valueMap.get(“score”));
                dos.println();
            }

            dos.close();
            fos.close();

        }catch (IOException e) {
            System.err.println(“Error writeAwnsersNodesFile File”);
        }
    }

    private void writeUsersNodesFile(){
        try{
            FileWriter fos = new FileWriter(USER_NODE_FILE);
            PrintWriter dos = new PrintWriter(fos);
            dos.println(“id:string:users\tname”);
            int count = 0;

            for(String user : users){
                dos.print(user);
                dos.print(“\t”+randomNames.get(count));
                dos.println();
                count++;
            }

            dos.close();
            fos.close();

        }catch (IOException e) {
            System.err.println(“Error writeUsersNodesFile File”);
        }
    }

    private void writeRelationsFile(){
        try{
            FileWriter fos = new FileWriter(RELATIONS_FILE);
            PrintWriter dos = new PrintWriter(fos);

            dos.println(“id:string:users\tid:string:users\ttype”);

            for (Map.Entry<String, String> entry : relsMap.entrySet()){

                String splitKeys[] = entry.getKey().split(“::”);
                dos.print(splitKeys[0]+”\t”);
                dos.print(splitKeys[1]+”\t”);
                dos.println(entry.getValue());
            }

            dos.close();
            fos.close();

        }catch (IOException e) {
            System.err.println(“Error writeRelationsFile File”);
        }
    }

    private void readRandomNames(){
        try{
            BufferedReader in = new BufferedReader(new FileReader(RANDOM_NAME_FILE));
            String line = “”;

            while ((line = in.readLine()) != null) {
                randomNames.add(line);
            }

            in.close();
        }catch (IOException e) {
            System.err.println(“Error readRandomNames File”);
        }
    }

    public static void main(String[] args){
        try{
            long start = System.currentTimeMillis();
            NodeRelationCreator nodeRelationCreator = new NodeRelationCreator();
            nodeRelationCreator.readRandomNames();
            nodeRelationCreator.readFromCSV();
            long end = System.currentTimeMillis();

            System.out.println(“Done Processing in “+(end – start)+ ” ms”);
        }
        catch(Exception e){
            System.out.println(“Exception in main is “+e.getMessage());
            e.printStackTrace();
        }
    }
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

package com.treselle.soagrapher;

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.FileWriter;

import java.io.IOException;

import java.io.PrintWriter;

import java.util.ArrayList;

import java.util.HashMap;

import java.util.HashSet;

import java.util.List;

import java.util.Map;

import java.util.Map.Entry;

import java.util.Set;

import au.com.bytecode.opencsv.CSVReader;

public class NodeRelationCreator {

private static final String QUESTION_NODE_FILE = “question_nodes.csv”;

private static final String USER_NODE_FILE = “user_nodes.csv”;

private static final String ANSWER_NODE_FILE = “answer_nodes.csv”;

private static final String RELATIONS_FILE = “rels.csv”;

private static final String INPUT_FILE = “sodata.csv”;

private static final String RANDOM_NAME_FILE = “random_names.txt”;

//stores question id as the key and views, score as map values

private static Map<String, Map<String, String>> questions = new HashMap<String, Map<String, String>>();

//stores unique userids of both questioner and answerer

private static Set<String> users = new HashSet<String>();

//stores random names from the file

private static List<String> randomNames = new ArrayList<String>();

//stores answerid as key and score as the map values

private static Map<String, Map<String, String>> answers = new HashMap<String, Map<String, String>>();

//stores various relations between nodes. The key is two nodes delimited by :: and the value is relation type

private static Map<String, String> relsMap = new HashMap<String, String>();

private void readFromCSV() throws Exception{

//Read the CSV with tab delimited and skip first row

CSVReader csvReader = new CSVReader(new FileReader(INPUT_FILE),‘,’,‘\”‘,1);

String[] rows = null;

String lang = null;

String questionId = null;

String question_user = null;

String question_score = null;

String question_views = null;

String answerId = null;

String answer_user = null;

String answer_score = null;

Map<String, String> questionAttrs = null;

Map<String, String> answerAttrs = null;

while((rows = csvReader.readNext()) != null) {

questionAttrs = new HashMap<String, String>();

answerAttrs = new HashMap<String, String>();

questionId = rows[0];

question_user = rows[1];

question_score = rows[2];

lang = rows[3];

question_views = rows[4];

answerId = rows[6];

answer_user = rows[7];

answer_score = rows[8];

questionAttrs.put(“views”,question_views);

questionAttrs.put(“score”,question_score);

questions.put(“qid_”+questionId, questionAttrs);

answerAttrs.put(“score”, answer_score);

answers.put(“aid_”+answerId, answerAttrs);

users.add(“uid_”+question_user);

users.add(“uid_”+answer_user);

relsMap.put(“qid_”+questionId+“::”+“aid_”+answerId, “HAS_ANSWER”);

relsMap.put(“qid_”+questionId+“::”+“uid_”+question_user, “ASKED_BY”);

relsMap.put(“aid_”+answerId+“::”+“uid_”+answer_user, “ANSWERED_BY”);

relsMap.put(“qid_”+questionId+“::”+lang, “REFERS”);

}

this.writeQuestionNodesFile();

this.writeAwnsersNodesFile();

this.writeUsersNodesFile();

this.writeRelationsFile();

csvReader.close();

}

private void writeQuestionNodesFile(){

try{

FileWriter fos = new FileWriter(QUESTION_NODE_FILE);

PrintWriter dos = new PrintWriter(fos);

dos.println(“id:string:users\tname\tviews\tscore”);

for (Entry<String, Map<String, String>> entry : questions.entrySet()){

dos.print(entry.getKey());

Map<String, String> valueMap = entry.getValue();

dos.print(“\t”+entry.getKey());

dos.print(“\t”+valueMap.get(“views”));

dos.print(“\t”+valueMap.get(“score”));

dos.println();

}

dos.close();

fos.close();

}catch (IOException e) {

System.err.println(“Error writeQuestionNodesFile File”);

}

private void writeAwnsersNodesFile(){

try{

FileWriter fos = new FileWriter(ANSWER_NODE_FILE);

PrintWriter dos = new PrintWriter(fos);

dos.println(“id:string:users\tname\tscore”);

for (Entry<String, Map<String, String>> entry : answers.entrySet()){

dos.print(entry.getKey());

Map<String, String> valueMap = entry.getValue();

dos.print(“\t”+entry.getKey());

dos.print(“\t”+valueMap.get(“score”));

dos.println();

}

dos.close();

fos.close();

}catch (IOException e) {

System.err.println(“Error writeAwnsersNodesFile File”);

}

private void writeUsersNodesFile(){

try{

FileWriter fos = new FileWriter(USER_NODE_FILE);

PrintWriter dos = new PrintWriter(fos);

dos.println(“id:string:users\tname”);

int count = 0;

for(String user : users){

dos.print(user);

dos.print(“\t”+randomNames.get(count));

dos.println();

count++;

}

dos.close();

fos.close();

}catch (IOException e) {

System.err.println(“Error writeUsersNodesFile File”);

}

private void writeRelationsFile(){

try{

FileWriter fos = new FileWriter(RELATIONS_FILE);

PrintWriter dos = new PrintWriter(fos);

dos.println(“id:string:users\tid:string:users\ttype”);

for (Map.Entry<String, String> entry : relsMap.entrySet()){

String splitKeys[] = entry.getKey().split(“::”);

dos.print(splitKeys[0]+“\t”);

dos.print(splitKeys[1]+“\t”);

dos.println(entry.getValue());

}

dos.close();

fos.close();

}catch (IOException e) {

System.err.println(“Error writeRelationsFile File”);

}

private void readRandomNames(){

try{

BufferedReader in = new BufferedReader(new FileReader(RANDOM_NAME_FILE));

String line = “”;

while ((line = in.readLine()) != null) {

randomNames.add(line);

}

in.close();

}catch (IOException e) {

System.err.println(“Error readRandomNames File”);

}

public static void main(String[] args){

try{

long start = System.currentTimeMillis();

NodeRelationCreator nodeRelationCreator = new NodeRelationCreator();

nodeRelationCreator.readRandomNames();

nodeRelationCreator.readFromCSV();

long end = System.currentTimeMillis();

System.out.println(“Done Processing in “+(end – start)+ ” ms”);

}

catch(Exception e){

System.out.println(“Exception in main is “+e.getMessage());

e.printStackTrace();

}

- Output of the Program:

Run the above program from command line or within eclipse to create question_nodes.csv, answer_nodes.csv, user_nodes.csv, and rels.csv. Click here to download nodes and relationship zip file to quickly run it thru BatchImporter to create Graph DB.

Create GraphDB with Batch Importer

Download and Set up Batch Importer: Batch Importer program is a separate library that will create Graphdb data file which is needed by Neo4j. The input to the Batch Importer is configured in the batch.properties file that indicates what files to use as Nodes and Relationships. More details about the Batch Importer can be found in the readme at https://github.com/jexp/batch-import/tree/20

Download Link: https://dl.dropboxusercontent.com/u/14493611/batch_importer_20.zip

Note: Unzip to the location where the nodes and relationship files are created by the Java program.

Create batch.properties: Create the batch.properties file as shown below. The details of each of the properties is better explained at BatchImporter site. The highlighted properties are the most important that defines nodes and relationship input files.

dump_configuration=false
cache_type=none
use_memory_mapped_buffers=true
neostore.propertystore.db.index.keys.mapped_memory=5M
neostore.propertystore.db.index.mapped_memory=5M
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.strings.mapped_memory=200M

batch_import.node_index.users=exact
batch_import.nodes_files=lang_nodes.csv,question_nodes.csv,answer_nodes.csv,user_nodes.csv
batch_import.rels_files=rels.csv

dump_configuration=false

cache_type=none

use_memory_mapped_buffers=true

neostore.propertystore.db.index.keys.mapped_memory=5M

neostore.propertystore.db.index.mapped_memory=5M

neostore.nodestore.db.mapped_memory=200M

neostore.relationshipstore.db.mapped_memory=500M

neostore.propertystore.db.mapped_memory=200M

neostore.propertystore.db.strings.mapped_memory=200M

batch_import.node_index.users=exact

batch_import.nodes_files=lang_nodes.csv,question_nodes.csv,answer_nodes.csv,user_nodes.csv

batch_import.rels_files=rels.csv

Execute Batch Importer: Execute the batch importer program with import.bat within the Batch Importer directory and pass batch.properties and name of the graph db file to create

//This command will create graph.db data file in the same location as your nodes and relationship file batch_importer_20\import.bat batch.properties graph.db

1

2

3

//This command will create graph.db data file in the same location as your nodes and relationship file

batch_importer_20\import.bat batch.properties graph.db

Using Existing Configuration File
Importing 9 Nodes took 0 seconds
Importing 676 Nodes took 0 seconds
Importing 1653 Nodes took 0 seconds
Importing 1491 Nodes took 0 seconds
Importing 4656 Relationships skipped (2) took 0 seconds

Total import time: 2 seconds

Using Existing Configuration File

Importing 9 Nodes took 0 seconds

Importing 676 Nodes took 0 seconds

Importing 1653 Nodes took 0 seconds

Importing 1491 Nodes took 0 seconds

Importing 4656 Relationships skipped (2) took 0 seconds

Total import time: 2 seconds

Visualize Graph with Neo4j

Copy graph.db file: Create a new directory “data” under the root of Neo4J installation directory and copy graph.db to data directory. This is optional but recommended to keep the graph.db in the same location as Neo4j.

Start Neo4j: Execute “neo4j-community” file under bin directory of Neo4j to start Neo4j. You will be prompted to choose the location of the graph.db file.
Visualize Graphs:
- Launch Neo4j Web Console: http://localhost:7474/browser/

Navigate to Graphs: Click on the bubbles on the left top and choose “*”
Customize Graph Attributes: Double click on “Java” node and choose “name” as the caption.
Explore Graphs: The below exploration shows the following:

Tracing the orange line indicates how the user Trevor answered (aid_853052) a Java question also asked a PHP question (qid_865476). Tracing the red line indicates the user Audrey answered two Java questions (aid_853030 and aid_892379). It’s lot of fun to work with Graph Database as the traversals are limitless. BTW, user names are fictional and not real users

Conclusion

Neo4j is one of the best graph databases around and comes with powerful Cypher Query Language that enables us to traverse the nodes via the relationships and using nodes properties as well. We will be covering CQL in our next blog post based on this graph data.
R is very handy in performing many data manipulation techniques to quickly cleanse, transform, and alter the data to our needs.
Neo4j also comes with Rest API to add nodes and relationships dynamically on the existing graph DB.

References

Neo4J: http://www.neo4j.org/
Neo4J Use Cases: http://www.neo4j.org/learn/use_cases
R: http://www.r-project.org/
Neo4J Batch Importer: https://github.com/jexp/batch-import/tree/20
Files: Click here to download nodes and relationship zip file

The post Embrace Relationships with Neo4J, R & Java appeared first on treselle.com.