Integration Matters: February 2012

I've recently had a brief introduction to the Neo4j database - by way of the YOW 2011 conference in Brisbane.

It looks really interesting - so I set about performing a few experiments. One of which is taking a graph and exporting it for use via other tools.

For example, here's a really simple graph - shown via neoclipse:

Sample Graph

And the code to produce it is like this:

Node nodeA = graphDb.createNode();
Node nodeB = graphDb.createNode();
Node nodeC = graphDb.createNode();
Node nodeD = graphDb.createNode();
Node nodeE = graphDb.createNode();

nodeA.setProperty("name", "A");
nodeB.setProperty("name", "B");
nodeC.setProperty("name", "C");
nodeD.setProperty("name", "D");
nodeE.setProperty("name", "E");

Relationship rel = null;
rel = nodeB.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeC.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeD.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);
rel = nodeE.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);

Nothing exciting to see there.

So, using the Cypher language for querying, I thought I'd investigate how to dump the graph structure so I could export it.

Assuming that A is the starting point of our Graph - just getting the nodes which are related to A can be found via this query:

start n=node:concepts(name="A")
match (n)<-[r]-(x) return x.name, r

Results are:

+-------------------------+

| x.name | r |

+-------------------------+

| "C" | :DEPENDS[1] {} |

| "B" | :DEPENDS[0] {} |

+-------------------------+

2 rows, 978 ms

But, to retrieve the whole graph, I require all nodes which have a relationship with A. So another attempt is this - allowing for multiple depth relationships:

start n=node:concepts(name="A")

match (n)<-[r:DEPENDS*1..3]-(x) return x.name, r

Results are:

+-------------------------------------------------+

| x.name | r |

+-------------------------------------------------+

| "C" | List(Relationship[1]) |

| "E" | List(Relationship[1], Relationship[3]) |

| "D" | List(Relationship[1], Relationship[2]) |

| "B" | List(Relationship[0]) |

+-------------------------------------------------+

4 rows, 100 ms

However, this doesn't help to recreate the graph. To do this, I need each source and destination node - and the relationship. The next attempt makes use of the fact that you can specify a minimum cardinality of zero of the relationship predicate - which allows you to include the start node as well. Using this allows us to construct a query like this:

start n=node:concepts(name="A")

match p1=(n)<-[rel:DEPENDS*0..2]-(x)<-[r:DEPENDS]-(y)

return n, x, r, y

Which returns results like this:

+-------------------------------------------------------------------------------+

| n | x | r | y |

+-------------------------------------------------------------------------------+

| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[1] {} | Node[3]{name->"C"} |

| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[0] {} | Node[2]{name->"B"} |

| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[3] {} | Node[5]{name->"E"} |

| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[2] {} | Node[4]{name->"D"} |

+-------------------------------------------------------------------------------+

4 rows, 31 ms

From here, it's a small matter of programming to iterate through these results, and generate an XML representation (for example, GraphML style) - like this:

<graph start="1">
<node id="1">
<data key="d0">A</data>
</node>
<node id="3">
<data key="d0">C</data>
</node>
<edge id="e1" source="3" target="1">
<data key="d1">DEPENDS</data>
</edge>
<node id="2">
<data key="d0">B</data>
</node>
<edge id="e0" source="2" target="1">
<data key="d1">DEPENDS</data>
</edge>
<node id="5">
<data key="d0">E</data>
</node>
<edge id="e3" source="5" target="3">
<data key="d1">DEPENDS</data>
</edge>
<node id="4">
<data key="d0">D</data>
</node>
<edge id="e2" source="4" target="3">
<data key="d1">DEPENDS</data>
</edge>
</graph>

The first column of the result - N - is simple used to infer the start node. In this case, it's "A" - as specified by the query.

Integration Matters

Monday, February 20, 2012

Exporting Graphs from neo4j

Links