Monday, February 20, 2012

Exporting Graphs from neo4j

I've recently had a brief introduction to the Neo4j database - by way of the YOW 2011 conference in Brisbane.

It looks really interesting - so I set about performing a few experiments. One of which is taking a graph and exporting it for use via other tools.

For example, here's a really simple graph - shown via neoclipse:
Sample Graph
And the code to produce it is like this:

Node nodeA = graphDb.createNode();
Node nodeB = graphDb.createNode();
Node nodeC = graphDb.createNode();
Node nodeD = graphDb.createNode();
Node nodeE = graphDb.createNode();


nodeA.setProperty("name", "A");
nodeB.setProperty("name", "B");
nodeC.setProperty("name", "C");
nodeD.setProperty("name", "D");
nodeE.setProperty("name", "E");


Relationship rel = null;
rel = nodeB.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeC.createRelationshipTo(nodeA, RelationshipTypes.DEPENDS);
rel = nodeD.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);
rel = nodeE.createRelationshipTo(nodeC, RelationshipTypes.DEPENDS);

Nothing exciting to see there.

So, using the Cypher language for querying, I thought I'd investigate how to dump the graph structure so I could export it.

Assuming that A is the starting point of our Graph - just getting the nodes which are related to A can be found via this query:

start n=node:concepts(name="A") 
match (n)<-[r]-(x) return x.name, r

Results are:
+-------------------------+
| x.name | r              |
+-------------------------+
| "C"    | :DEPENDS[1] {} |
| "B"    | :DEPENDS[0] {} |
+-------------------------+
2 rows, 978 ms

But, to retrieve the whole graph, I require all nodes which have a relationship with A. So another attempt is this - allowing for multiple depth relationships:

start n=node:concepts(name="A") 
match (n)<-[r:DEPENDS*1..3]-(x) return x.name, r

Results are:
+-------------------------------------------------+
| x.name | r                                      |
+-------------------------------------------------+
| "C"    | List(Relationship[1])                  |
| "E"    | List(Relationship[1], Relationship[3]) |
| "D"    | List(Relationship[1], Relationship[2]) |
| "B"    | List(Relationship[0])                  |
+-------------------------------------------------+
4 rows, 100 ms

However, this doesn't help to recreate the graph. To do this, I need each source and destination node - and the relationship. The next attempt makes use of the fact that you can specify a minimum cardinality of zero of the relationship predicate - which allows you to include the start node as well. Using this allows us to construct a query like this:

start n=node:concepts(name="A") 
match p1=(n)<-[rel:DEPENDS*0..2]-(x)<-[r:DEPENDS]-(y) 
return n, x, r, y 

Which returns results like this:
+-------------------------------------------------------------------------------+
| n                  | x                  | r              | y                  |
+-------------------------------------------------------------------------------+
| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[1] {} | Node[3]{name->"C"} |
| Node[1]{name->"A"} | Node[1]{name->"A"} | :DEPENDS[0] {} | Node[2]{name->"B"} |
| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[3] {} | Node[5]{name->"E"} |
| Node[1]{name->"A"} | Node[3]{name->"C"} | :DEPENDS[2] {} | Node[4]{name->"D"} |
+-------------------------------------------------------------------------------+
4 rows, 31 ms

From here, it's a small matter of programming to iterate through these results, and generate an XML representation (for example, GraphML style) - like this:


<graph start="1">
  <node id="1">
    <data key="d0">A</data>
  </node>
  <node id="3">
    <data key="d0">C</data>
  </node>
  <edge id="e1" source="3" target="1">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="2">
    <data key="d0">B</data>
  </node>
  <edge id="e0" source="2" target="1">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="5">
    <data key="d0">E</data>
  </node>
  <edge id="e3" source="5" target="3">
    <data key="d1">DEPENDS</data>
  </edge>
  <node id="4">
    <data key="d0">D</data>
  </node>
  <edge id="e2" source="4" target="3">
    <data key="d1">DEPENDS</data>
  </edge>
</graph>


The first column of the result - N - is simple used to infer the start node. In this case, it's "A" - as specified by the query.

3 comments:

Hendy said...

It's interesting. How is it different (or better?) than exporting the entire neo4j graph as GraphML using Blueprints?
( http://stackoverflow.com/questions/2204440/convert-neo4j-db-to-xml )

Perhaps to export only a part of the graph?

Evan said...

good point.

I'm fairly new to graphs from within Java - and haven't done anything with Blueprints.

It does indeed seem simple to export an entire graph that way.

I guess, I was coming at the problem from a few angles:

- being able to export a subgraph - in essence, the results of a query
- using the Cypher language for that query
- and having fine grained control of the output.

But, the Blueprints framework does look interesting from an abstraction point of view.

cheers,
Evan

Unknown said...

can you please give us the simpl code of programming to generate an xml file from a neo4j DB using java because i have a project in this and i need it thanks for helping !!