Neo4j性能调整



我是Neo4j的新手,目前我正在尝试将约会网站打造成POC。我有4GB的输入文件,看起来像下面的格式。

其中包含viewerId(男性/女性),viewedId是他们查看过的id列表。基于这个历史文件,当任何用户上网时,我都需要给出建议。

输入文件:

viewerId   viewedId 
12345   123456,23456,987653 
23456   23456,123456,234567 
34567   234567,765678,987653 
:

对于这项任务,我尝试了以下方法,

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR 't'
WITH row, split(row.viewedId, ",") AS viewedIds
UNWIND viewedIds AS viewedId
MERGE (p2:Persons2 {viewerId: row.viewerId})
MERGE (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2)
MERGE (c2)-[:Sees]->(p2);

我的Cypher查询得到的结果是,

MATCH (p2:Persons2)-[r*1..3]->(c2: Companies2)
RETURN p2,r, COLLECT(DISTINCT c2) as friends 

要完成此任务,需要3天时间。

我的系统配置:

Ubuntu -14.04  
RAM -24GB

Neo4j配置:
neo4j.properties:

neostore.nodestore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=2300M
neostore.propertystore.db.arrays.mapped_memory=5M
neostore.propertystore.db.strings.mapped_memory=3200M
neostore.relationshipstore.db.mapped_memory=800M

neo4j-wrapper.conf

wrapper.java.initmemory=12000
wrapper.java.maxmemory=12000

为了减少时间,我在互联网上搜索并从以下链接获得了一个想法,比如Batch importer,https://github.com/jexp/batch-import

在该链接中,他们有node.csv、rels.csv文件,它们导入到Neo4j中。我不知道他们是如何创建node.csv和rels.csv文件的,他们正在使用哪些脚本等等。

有人能给我一个为我的数据制作node.csv和rels.csv文件的示例脚本吗?

或者,您可以为加快导入和检索数据提供一些建议吗?

提前感谢。

你不需要逆关系,只有一个就足够了

对于Import,将堆配置为12G,将页面缓存配置为10G。

试试这个,几分钟后就会完成。

create constraint on (p:Persons2) assert p.viewerId is unique;
create constraint on (p:Companies2) assert p.viewedId is unique;
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR 't'
MERGE (p2:Persons2 {viewerId: row.viewerId});
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR 't'
FOREACH (viewedId IN split(row.viewedId, ",") |
  MERGE (c2:Companies2 {viewedId: viewedId}));
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR 't'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2);

对于关系合并,如果你有一些公司拥有数十万到数百万的浏览量,你可能想使用这个:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR 't'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
WHERE shortestPath((p2)-[:Friends]->(c2)) IS NULL
CREATE (p2)-[:Friends]->(c2);

关于您的查询

通过检索所有人和所有公司之间的交叉产品,您想实现什么?这可能是数万亿条路径?

通常,您想为单身个人或公司了解这一点。

更新查询

例如。对于123456,所有查看过该公司的人都是1234523456,那么这些人查看过的公司是什么12345 1234562345698765323456 23456123456234567然后我需要向公司推荐123456为234569876532345623456723456987652234567不同结果(最终结果)23456987651234567

match (c:Companies2)<-[:Friends]-(p1:Persons2)-[:Friends]->(c2:Companies2)
where c.viewedId = 123456
return distinct c2.viewedId;

对于所有公司来说,这可能会有所帮助:

match (c:Companies2)<-[:Friends]-(p1:Persons2)
with p1, collect(c) as companies
match (p1)-[:Friends]->(c2:Companies2)
return c2.viewedId, extract(c in companies | c.viewedId);

相关内容

  • 没有找到相关文章

最新更新