Integration Matters: Exporting Graphs from neo4j

I've recently had a brief introduction to the Neo4j database - by way of the YOW 2011 conference in Brisbane.

It looks really interesting - so I set about performing a few experiments. One of which is taking a graph and exporting it for use via other tools.

For example, here's a really simple graph - shown via neoclipse:

Sample Graph

And the code to produce it is like this:

Node nodeA = graphDb.createNode();
Node nodeB = graphDb.createNode();
Node nodeC = graphDb.createNode();
Node nodeD = graphDb.createNode();
Node nodeE = graphDb.createNode();

nodeA.setProperty("name", "A");
nodeB.setProperty("name", "B");
nodeC.setProperty("name", "C");
nodeD.setProperty("name", "D");
nodeE.setProperty("name", "E");

Relationship rel = null;
rel = nodeB.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeC.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeD.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);
rel = nodeE.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);

Nothing exciting to see there.

So, using the Cypher language for querying, I thought I'd investigate how to dump the graph structure so I could export it.

Assuming that A is the starting point of our Graph - just getting the nodes which are related to A can be found via this query:

start n=node:concepts(name="A")
match (n)<-[r]-(x) return x.name, r

Results are:

+-------------------------+

| x.name | r |

+-------------------------+

| "C" | :DEPENDS[1] {} |

| "B" | :DEPENDS[0] {} |

+-------------------------+

2 rows, 978 ms

But, to retrieve the whole graph, I require all nodes which have a relationship with A. So another attempt is this - allowing for multiple depth relationships:

start n=node:concepts(name="A")

match (n)<-[r:DEPENDS*1..3]-(x) return x.name, r

Results are:

+-------------------------------------------------+

| x.name | r |

+-------------------------------------------------+

| "C" | List(Relationship[1]) |

| "E" | List(Relationship[1], Relationship[3]) |

| "D" | List(Relationship[1], Relationship[2]) |

| "B" | List(Relationship[0]) |

+-------------------------------------------------+

4 rows, 100 ms

However, this doesn't help to recreate the graph. To do this, I need each source and destination node - and the relationship. The next attempt makes use of the fact that you can specify a minimum cardinality of zero of the relationship predicate - which allows you to include the start node as well. Using this allows us to construct a query like this:

start n=node:concepts(name="A")

match p1=(n)<-[rel:DEPENDS*0..2]-(x)<-[r:DEPENDS]-(y)

return n, x, r, y

Which returns results like this:

+-------------------------------------------------------------------------------+

| n | x | r | y |

+-------------------------------------------------------------------------------+

| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[1] {} | Node[3]{name->"C"} |

| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[0] {} | Node[2]{name->"B"} |

| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[3] {} | Node[5]{name->"E"} |

| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[2] {} | Node[4]{name->"D"} |

+-------------------------------------------------------------------------------+

4 rows, 31 ms

From here, it's a small matter of programming to iterate through these results, and generate an XML representation (for example, GraphML style) - like this:

<graph start="1">
<node id="1">
<data key="d0">A</data>
</node>
<node id="3">
<data key="d0">C</data>
</node>
<edge id="e1" source="3" target="1">
<data key="d1">DEPENDS</data>
</edge>
<node id="2">
<data key="d0">B</data>
</node>
<edge id="e0" source="2" target="1">
<data key="d1">DEPENDS</data>
</edge>
<node id="5">
<data key="d0">E</data>
</node>
<edge id="e3" source="5" target="3">
<data key="d1">DEPENDS</data>
</edge>
<node id="4">
<data key="d0">D</data>
</node>
<edge id="e2" source="4" target="3">
<data key="d1">DEPENDS</data>
</edge>
</graph>

The first column of the result - N - is simple used to infer the start node. In this case, it's "A" - as specified by the query.

3 comments:

Hendy said...: It's interesting. How is it different (or better?) than exporting the entire neo4j graph as GraphML using Blueprints?
( http://stackoverflow.com/questions/2204440/convert-neo4j-db-to-xml )

Perhaps to export only a part of the graph?; 3:54 AM
Evan said...: good point.

I'm fairly new to graphs from within Java - and haven't done anything with Blueprints.

It does indeed seem simple to export an entire graph that way.

I guess, I was coming at the problem from a few angles:

- being able to export a subgraph - in essence, the results of a query
- using the Cypher language for that query
- and having fine grained control of the output.

But, the Blueprints framework does look interesting from an abstraction point of view.

cheers,
Evan; 10:29 AM
Unknown said...: can you please give us the simpl code of programming to generate an xml file from a neo4j DB using java because i have a project in this and i need it thanks for helping !!; 9:54 PM

Integration Matters

Monday, February 20, 2012

Exporting Graphs from neo4j

3 comments:

Links