新到neo4j,我想加载一个JSON与以下结构到我的neo4j DB:
{
"nodes": [
{
"last_update": 1629022369,
"pub_key": "pub1",
"alias": "alias1"
},
{
"last_update": 1618162974,
"pub_key": "pub2",
"alias": "alias2"
},
{
"last_update": 1634745976,
"pub_key": "pub3",
"alias": "alias3"
}
],
"edges": [
{
"node1_pub": "pub1",
"node2_pub": "pub2",
"capacity": "37200"
},
{
"node1_pub": "pub2",
"node2_pub": "pub3",
"capacity": "37200"
},
{
"node1_pub": "pub3",
"node2_pub": "pub1",
"capacity": "37200"
}
]
}
我在单独的查询中加载节点和边:
WITH "file:///graph.json" AS graph
CALL apoc.load.json(graph) YIELD value
FOREACH (nodeObject in value.nodes | CREATE (node:Node {pubKey:nodeObject.pub_key}))
WITH "file:///graph.json" AS graph
CALL apoc.load.json(graph) YIELD value
UNWIND value.edges as edgeObject
MATCH (node1:Node {pubKey: edgeObject.node1_pub})
MATCH (node2:Node {pubKey: edgeObject.node2_pub})
CREATE (node1)-[:IS_CONNECTED {capacity: edgeObject.capacity}]->(node2)
对于少量的边,这工作得很好,但是我有一个大约100mb的文件,里面有很多边。在后一种情况下,查询不会返回。我从neo4j web界面运行它。Neo4j在docker中运行,最大堆大小设置为3g,这应该足够了。
我还没有掌握Cypher的所有概念,所以可能有更好的方法来做它。也可以在一个查询中,这样文件就不需要加载两次。
非常感谢!
可以使用txBatchSize参数批量加载json文件。请参阅下面的文档:
https://neo4j.com/labs/apoc/4.1/import/load-json/#load-json-available-procedures-apoc.import.json
WITH "file:///graph.json" as graph
CALL apoc.load.json(graph, '[0:10000]') YIELD value
RETURN value
Where it will return 10000 rows.
好了,在尝试了@jose_bacoy建议的批处理后,我发现即使是1000行也需要20秒左右的时间。显然,MATCH操作非常占用CPU。在我创建索引后,80k条边的导入效果非常好。
CREATE INDEX FOR (n:Node) ON (n.pubKey)