Did you know that you can navigate the posts by swiping left and right?

Neo4j Fundamentals

December 02, 2017, December 02, 2017 | Comments

category: BigData

Neo4j is a graph database which stores connections between nodes as first citizens. Different from traditional relational databases, such as Oracle, a graph database infers from data connections rather than using keys to join different tables together. In other words, a graph database starts with nodes/connections to figure out its related neighborhood rather than using query on top of query to get the connected information. To better understand graph database, it is necessary to understand Property Graph Model first.

1. Property Graph Model

There are four basic elements in the Property Graph Model, Nodes that represents the objects in the graph, Relationships that relate nodes by type and direction, Properties that are name-value pairs going with nodes and relationships, and Labels that groups nodes by role. The property graph contains nodes that hold any number of properties. Each node could be tagged with different labels to represent different roles in each domain. Relationships provide directed relevant connections between different nodes although they could also be accessed regardless directions. Like nodes, relationships could also have any properties.

2. Cypher Query Language

Cypher is a declarative, expressive and pattern matching query language for graphs by Neo4j. t allows us to state what we want to select, insert, update or delete from our graph data without requiring us to describe exactly how to do it. Cypher is using Ascii Art for Patterns. It is declarative, readable and expressive with powerful capabilities. It is also an open Language which you contribute to it through GitHub.

Nodes is created by parentheses ( ). It can be labeled by colon : and can have properties by curly braces { } stored as key/value pairs. Properties can be strings, numbers, or booleans, and lists of strings, numbers, or booleans. For example:

CREATE (va:State {Name: "Virginia"}) // create a node with 'State' as its label and property in {}
CREATE (md:State {Name: "Maryland"}) // create another node


Relationships is created by hyphens - - & square brackets [ ] with arrows < > to specify directions. Like nodes, relationships can have labels and properties.

MATCH (va:State),(md:State)
WHERE va.Name = "Virginia" and md.Name = "Maryland"
CREATE (va)-[:Bordered]->(md)

//Which is equivalent to
MERGE (va:State {Name: "Virginia"})
MERGE (md:State {Name: "Maryland"});
MERGE (va)-[:Bordered]->(md)

//Which is similar as
CREATE (va:State {Name: "Virginia"})<-[b:Bordered]-(md:State {Name: "Maryland"})

MATCH (n) 

alt text

The first query actually first finds the nodes and setup the relationship using MATCH and CREATE. The third query is using the same command but it would create another node-relationship pair. If you don’t want to create duplications, you should use MERGE command since it is essentially a combination of find and create. However, using MERGE may be quite expensive since it searches the whole existing relationships to figure out whether it needs to be created or not. Please note, in Cypher, labels, relationship-types, property-names and variables are case sensitive. All others are not.

We can also update properties by using SET. For example, we can add capital city in each state node.

MATCH (va:State {Name: "Virginia"})
SET va.Capital = "Richmond"

MATCH (md:State {Name: "Maryland"})
SET md.Capital = "Annapolis"

There are also some handy syntaxes to remember.

//Showing all existing nodes/relationships

//Delete nodes
MATCH (va:State {Name: "Virginia", Capital: "Richmond"})

//Remove properties
MATCH (md:State {Name: "Maryland"})
REMOVE md.Capital RETURN md

//Delete relationships
MATCH (va:State {Name: "Virginia"})<-[b:Bordered]-(md:State {Name: "Maryland"})

//Delete all

3. Reading Data from CSV

Theoretically, nodes/relationships could be created by the methods above by saving the script with .cyp as suffix. But you can imagine it won’t scale once the data reach to a moderate size. In Neo4j, it uses LOAD CSV as an ETL tool to load csv file from http(s) or file URL. The basic syntax is as follows:

[USING PERIODIC COMMIT] // optional transaction batching
LOAD CSV // load csv data
WITH HEADERS // optionally use first header row as keys in "row" map
FROM "url" // file:// URL relative to $NEO4J_HOME/import or http://
AS row // return each row of the CSV as list of strings or map
FIELDTERMINATOR ";" // alternative field delimiter 

... rest of the Cypher statement ...

4. From Relational to Graph Model

You may be familiar with relational database by using SQL to set up relationships between different tables while in Neo4j, we only see nodes and relationships. It is natural to think how to bridge the gap between relational model and graph model. Roughly speaking, you may want to think entity-tables as nodes and joins as relationships and all the foreign keys as relationships.


(1). What is a Graph Database?, https://neo4j.com/developer/graph-database/.
(2). Why Graph Databases?, https://neo4j.com/why-graph-databases/.