将CSV关系导入Neo4j



我正在尝试将数据从MySQL数据库导入到Neo4j,使用CSV文件作为中介。我正在学习这个基本的例子,但不能很好地发挥作用。我正在导入两个带有以下查询的表:

//Import projects.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/projects.csv" AS row
CREATE (:project
{
     project_id: row.fan,
     project_name: row.project_name
});
//Import people.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/persons.csv" AS row
CREATE (:person
{
     person_id: row.person_id,
     person_name: row.person_name,
});
//Create indicies.
CREATE INDEX ON :project(project_id);
CREATE INDEX ON :project(project_name);
CREATE INDEX ON :person(person_id);
CREATE INDEX ON :person(person_name);

这部分有效。无效的是当我尝试导入关系时:

//Create project-person relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
MATCH (project:project {project_id: row.project_id})
MATCH (person:person {person_id: row.person_id})
MERGE (person)-[:CONTRIBUTED]->(project);

控制台接受查询时不会出现错误,但永远不会完成。它已经以100%的CPU、25%的RAM运行了好几天,但磁盘使用率可以忽略不计。数据库信息中未显示任何关系。

我是在什么地方犯了错误,还是真的这么慢?project_persons.csv文件有1300万行长,但定期提交现在不应该显示一些内容吗?

shouldn't the periodic commit make something show up by now?

只针对LOAD-在CREATE前面做一个"解释",它会告诉你它是如何构建更新的,以及它希望处理的记录数。我遇到了同样的问题——Neo4j将整个更新作为一个事务进行,但从未完成。需要将事务分解为5万到10万个tx块才能完成所有任务。

一种方法是将关系信息导入为一组标记的节点,然后使用这些节点MATCH()个人和项目节点,并根据需要创建关系。

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
CREATE (:Relations {project_id: row.project_id, person_id: row.person_id})

然后按50K批处理记录:

MATCH (r:Relations) 
MATCH (prj:project {project_id: r.project_id})
MATCH (per:person {person_id: r.person_id})
WITH r, prj, per LIMIT 50000
MERGE (per)-[:CONTRIBUTED]->(prj)
DELETE r

多次运行此操作,直到创建了所有关系,您就可以开始了。

最新更新