加载 CSV 内存池错误:"The memory pool limit was exceeded"



我尝试用cypher-shell加载一个大型CSV到Neo4j 5.2,并遇到一个我以前没有见过的错误。Neo4j在Docker中运行,我在Docker容器中使用cypher-shelldocker exec ... cypher-shell:

Unable to complete transaction.: The memory pool limit was exceeded. The corresponding setting can be found in the error message

Neo4j日志为空,堆大小是总文件大小的5倍。事务甚至应该适合内存,但我使用CALL ... IN TRANSACTIONS

查询不应该碰到一个Eager操作符:

LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line FIELDTERMINATOR ','
CALL {
WITH line
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason
} IN TRANSACTIONS;

你知道是什么导致了这个错误吗?

事务中CALL{}的默认批处理大小为1k行。因此,对于大型事务性调用,这可能还不够。您可以将行数从5k调整到10k。

例如:

LOAD CSV FROM <csvfile> AS line
CALL {  ...
} IN TRANSACTIONS OF 10000 ROWS

参考:https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/_batching

然而,我会提出一个替代解决方案,即使用apoc迭代函数。请确保csv文件中没有重复的行。谢谢。

CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM 'file:///omop/CONCEPT_RELATIONSHIP_clean.csv' AS line RETURN line
','
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date = line.valid_start_date, r.valid_end_date = line.valid_end_date, r.invalid_reason = line.invalid_reason
',
{batchSize:10000, parallel:True}) YIELD batches, total
RETURN batches, total

试试下面的查询

LOAD CSV WITH HEADERS FROM "file:///omop/CONCEPT_RELATIONSHIP_clean.csv" AS 
line FIELDTERMINATOR ','
WITH line
MATCH (source:Concept { concept_id: line.concept_id_1 })
MATCH (target:Concept { concept_id: line.concept_id_2 })
CALL {
WITH line,source,target
CREATE (source)-[r:VOCAB_REL]->(target)
SET r.type = line.relationship_id, r.valid_start_date = 
line.valid_start_date, r.valid_end_date = line.valid_end_date, 
r.invalid_reason = line.invalid_reason
}
IN TRANSACTIONS OF 1000 ROWS;

检查在概念节点上是否有concept_id的索引

希望这对你有帮助!

最新更新