我尝试用cypher-shell
加载一个大型CSV到Neo4j 5.2,并遇到一个我以前没有见过的错误。Neo4j在Docker中运行,我在Docker容器中使用cypher-shell
与docker exec ... cypher-shell
:
Unable to complete transaction.: The memory pool limit was exceeded. The corresponding setting can be found in the error message
Neo4j日志为空,堆大小是总文件大小的5倍。事务甚至应该适合内存,但我使用CALL ... IN TRANSACTIONS
。
查询不应该碰到一个Eager操作符:
LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line FIELDTERMINATOR ','
CALL {
WITH line
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason
} IN TRANSACTIONS;
你知道是什么导致了这个错误吗?
事务中CALL{}的默认批处理大小为1k行。因此,对于大型事务性调用,这可能还不够。您可以将行数从5k调整到10k。
例如:
LOAD CSV FROM <csvfile> AS line
CALL { ...
} IN TRANSACTIONS OF 10000 ROWS
参考:https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/_batching
然而,我会提出一个替代解决方案,即使用apoc迭代函数。请确保csv文件中没有重复的行。谢谢。
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line RETURN line
','
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason
',
{batchSize:10000, parallel:True}) YIELD batches, total
RETURN batches, total
试试下面的查询
LOAD CSV WITH HEADERS FROM "file:///omop/CONCEPT_RELATIONSHIP_clean.csv" AS
line FIELDTERMINATOR ','
WITH line
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CALL {
WITH line,source,target
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date =
line.valid_start_date, r.valid_end_date = line.valid_end_date,
r.invalid_reason = line.invalid_reason
}
IN TRANSACTIONS OF 1000 ROWS;
检查在概念节点上是否有concept_id的索引
希望这对你有帮助!