最小示例:
有一些猫,他们喜欢爬不同类型的树。
我想找出喜欢爬树的猫群。
在下面的例子中,莉莉和贝拉的偏好有67%的重叠。他们应该被认定为一个群体。
Lune只是爬每棵树,所以她不应该成为小组的一员。
Cleo与这组完全分离,即与Lily和Bella有0%的重叠。
如何查询,返回组,让我们说至少50%的重叠?(本例中为一组,即"莉莉和贝拉")
CREATE (:Cat { name: 'Luna' });
CREATE (:Cat { name: 'Lily' });
CREATE (:Cat { name: 'Bella' });
CREATE (:Cat { name: 'Lucy' });
CREATE (:Cat { name: 'Nala' });
CREATE (:Cat { name: 'Callie' });
CREATE (:Cat { name: 'Kitty' });
CREATE (:Cat { name: 'Cleo' });
CREATE (:Tree { type: 'Red_maple' });
CREATE (:Tree { type: 'Loblolly_pine' });
CREATE (:Tree { type: 'American_sweetgum' });
CREATE (:Tree { type: 'Douglas_fir' });
CREATE (:Tree { type: 'Quaking_aspen' });
CREATE (:Tree { type: 'Sugar_maple' });
CREATE (:Tree { type: 'Balsam_fir' });
CREATE (:Tree { type: 'Flowering_dogwood' });
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'American_sweetgum' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Douglas_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'American_sweetgum' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Douglas_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Quaking_aspen' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Sugar_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Balsam_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Flowering_dogwood' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Sugar_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Balsam_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Flowering_dogwood' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
所以你真正想做的是:
-
使用GDS库中的节点相似度算法。默认是Jaccard相似度,或者您也可以使用重叠相似度。节点相似度算法将在你的猫之间创建相似关系。可通过
similarityCutoff
参数设置阈值。 -
一旦你创建了相似关系,你想要运行弱连接组件算法或像Louvain或Leiden这样的东西,这取决于哪个最适合你的用例。
所以在你的特定用例中,它看起来像:
构造投影图
CALL gds.graph.project('cats', ['Cat', 'Tree'], 'LIKES_TO_CLIMB');
运行Jaccard阈值为0.5的节点相似度算法
CALL gds.nodeSimilarity.mutate('cats', {mutateRelationshipType:'SIMILAR',
mutateProperty:'score', similarityCutoff:0.5})
运行WCC或其他社区检测算法
CALL gds.wcc.stream('cats', {relationshipTypes:['SIMILAR'], nodeLabels:['Cat']})
YIELD nodeId, componentId
RETURN componentId, collect(gds.util.asNode(nodeId).name) AS catGroup
这回报:
╒═════════════╤════════════════╕
│"componentId"│"catGroup" │
╞═════════════╪════════════════╡
│0 │["Luna"] │
├─────────────┼────────────────┤
│1 │["Lily","Bella"]│
├─────────────┼────────────────┤
│3 │["Lucy"] │
├─────────────┼────────────────┤
│4 │["Nala"] │
├─────────────┼────────────────┤
│5 │["Callie"] │
├─────────────┼────────────────┤
│6 │["Kitty"] │
├─────────────┼────────────────┤
│7 │["Cleo"] │
└─────────────┴────────────────┘
显然,您可以使用similarityCutoff
参数和其他社区检测算法来最适合您的用例