Monday, February 20, 2012

Exporting Graphs from neo4j

I've recently had a brief introduction to the Neo4j database - by way of the YOW 2011 conference in Brisbane.

It looks really interesting - so I set about performing a few experiments. One of which is taking a graph and exporting it for use via other tools.

For example, here's a really simple graph - shown via neoclipse:
Sample Graph
And the code to produce it is like this:

Node nodeA = graphDb.createNode();
Node nodeB = graphDb.createNode();
Node nodeC = graphDb.createNode();
Node nodeD = graphDb.createNode();
Node nodeE = graphDb.createNode();


nodeA.setProperty("name", "A");
nodeB.setProperty("name", "B");
nodeC.setProperty("name", "C");
nodeD.setProperty("name", "D");
nodeE.setProperty("name", "E");


Relationship rel = null;
rel = nodeB.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeC.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeD.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);
rel = nodeE.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);

Nothing exciting to see there.

So, using the Cypher language for querying, I thought I'd investigate how to dump the graph structure so I could export it.

Assuming that A is the starting point of our Graph - just getting the nodes which are related to A can be found via this query:

start n=node:concepts(name="A") 
match (n)<-[r]-(x) return x.name, r

Results are:
+-------------------------+
| x.name | r              |
+-------------------------+
| "C"    | :DEPENDS[1] {} |
| "B"    | :DEPENDS[0] {} |
+-------------------------+
2 rows, 978 ms

But, to retrieve the whole graph, I require all nodes which have a relationship with A. So another attempt is this - allowing for multiple depth relationships:

start n=node:concepts(name="A") 
match (n)<-[r:DEPENDS*1..3]-(x) return x.name, r

Results are:
+-------------------------------------------------+
| x.name | r                                      |
+-------------------------------------------------+
| "C"    | List(Relationship[1])                  |
| "E"    | List(Relationship[1], Relationship[3]) |
| "D"    | List(Relationship[1], Relationship[2]) |
| "B"    | List(Relationship[0])                  |
+-------------------------------------------------+
4 rows, 100 ms

However, this doesn't help to recreate the graph. To do this, I need each source and destination node - and the relationship. The next attempt makes use of the fact that you can specify a minimum cardinality of zero of the relationship predicate - which allows you to include the start node as well. Using this allows us to construct a query like this:

start n=node:concepts(name="A") 
match p1=(n)<-[rel:DEPENDS*0..2]-(x)<-[r:DEPENDS]-(y) 
return n, x, r, y 

Which returns results like this:
+-------------------------------------------------------------------------------+
| n                  | x                  | r              | y                  |
+-------------------------------------------------------------------------------+
| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[1] {} | Node[3]{name->"C"} |
| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[0] {} | Node[2]{name->"B"} |
| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[3] {} | Node[5]{name->"E"} |
| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[2] {} | Node[4]{name->"D"} |
+-------------------------------------------------------------------------------+
4 rows, 31 ms

From here, it's a small matter of programming to iterate through these results, and generate an XML representation (for example, GraphML style) - like this:


<graph start="1">
  <node id="1">
    <data key="d0">A</data>
  </node>
  <node id="3">
    <data key="d0">C</data>
  </node>
  <edge id="e1" source="3" target="1">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="2">
    <data key="d0">B</data>
  </node>
  <edge id="e0" source="2" target="1">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="5">
    <data key="d0">E</data>
  </node>
  <edge id="e3" source="5" target="3">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="4">
    <data key="d0">D</data>
  </node>
  <edge id="e2" source="4" target="3">
    <data key="d1">DEPENDS</data>
  </edge>
</graph>


The first column of the result - N - is simple used to infer the start node. In this case, it's "A" - as specified by the query.