如何通过与中间节点的关系找到具有高内聚性的节点组?



最小示例:

有一些猫,他们喜欢爬不同类型的树。

我想找出喜欢爬树的猫群。

在下面的例子中,莉莉和贝拉的偏好有67%的重叠。他们应该被认定为一个群体。

Lune只是爬每棵树,所以她不应该成为小组的一员。

Cleo与这组完全分离,即与Lily和Bella有0%的重叠。

如何查询,返回组,让我们说至少50%的重叠?(本例中为一组,即"莉莉和贝拉")

CREATE (:Cat { name: 'Luna' });
CREATE (:Cat { name: 'Lily' });
CREATE (:Cat { name: 'Bella' });
CREATE (:Cat { name: 'Lucy' });
CREATE (:Cat { name: 'Nala' });
CREATE (:Cat { name: 'Callie' });
CREATE (:Cat { name: 'Kitty' });
CREATE (:Cat { name: 'Cleo' });
CREATE (:Tree { type: 'Red_maple' });
CREATE (:Tree { type: 'Loblolly_pine' });
CREATE (:Tree { type: 'American_sweetgum' });
CREATE (:Tree { type: 'Douglas_fir' });
CREATE (:Tree { type: 'Quaking_aspen' });
CREATE (:Tree { type: 'Sugar_maple' });
CREATE (:Tree { type: 'Balsam_fir' });
CREATE (:Tree { type: 'Flowering_dogwood' });
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Lily' AND t.type = 'American_sweetgum' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Bella' AND t.type = 'Douglas_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Red_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Loblolly_pine' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'American_sweetgum' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Douglas_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Quaking_aspen' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Sugar_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Balsam_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Luna' AND t.type = 'Flowering_dogwood' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Sugar_maple' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Balsam_fir' CREATE (c)-[:LIKES_TO_CLIMB]->(t);
MATCH (c:Cat), (t:Tree) WHERE c.name = 'Cleo' AND t.type = 'Flowering_dogwood' CREATE (c)-[:LIKES_TO_CLIMB]->(t);

所以你真正想做的是:

  1. 使用GDS库中的节点相似度算法。默认是Jaccard相似度,或者您也可以使用重叠相似度。节点相似度算法将在你的猫之间创建相似关系。可通过similarityCutoff参数设置阈值。

  2. 一旦你创建了相似关系,你想要运行弱连接组件算法或像Louvain或Leiden这样的东西,这取决于哪个最适合你的用例。

所以在你的特定用例中,它看起来像:

构造投影图

CALL gds.graph.project('cats', ['Cat', 'Tree'], 'LIKES_TO_CLIMB');

运行Jaccard阈值为0.5的节点相似度算法

CALL gds.nodeSimilarity.mutate('cats', {mutateRelationshipType:'SIMILAR',
mutateProperty:'score', similarityCutoff:0.5})

运行WCC或其他社区检测算法

CALL gds.wcc.stream('cats', {relationshipTypes:['SIMILAR'], nodeLabels:['Cat']})
YIELD nodeId, componentId
RETURN componentId, collect(gds.util.asNode(nodeId).name) AS catGroup

这回报:

╒═════════════╤════════════════╕
│"componentId"│"catGroup"      │
╞═════════════╪════════════════╡
│0            │["Luna"]        │
├─────────────┼────────────────┤
│1            │["Lily","Bella"]│
├─────────────┼────────────────┤
│3            │["Lucy"]        │
├─────────────┼────────────────┤
│4            │["Nala"]        │
├─────────────┼────────────────┤
│5            │["Callie"]      │
├─────────────┼────────────────┤
│6            │["Kitty"]       │
├─────────────┼────────────────┤
│7            │["Cleo"]        │
└─────────────┴────────────────┘

显然,您可以使用similarityCutoff参数和其他社区检测算法来最适合您的用例

最新更新