蟒蛇,熊猫,如何找到每组之间的联系



为了创建网络,我很难根据相关数据(可能是groupby?(找到组之间的连接。

对于每个组,如果它们具有相同的元素,则它们是连接的。

例如,我的数据帧如下所示:

group_number    data
1                a
2                a
2                b
2                c
2                a
3                c
4                a
4                c

所以输出将是

Source_group  Target_group Frequency
2               1           1 (because a-a)
3               2           1 (because c-c)
4               2           2 (because a-a, c-c)

当然(because...)不会出现在输出中,只是解释

非常感谢

我考虑过你的问题。你可以做如下操作:

import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'group_number': [1,2,2,2,2,3,4,4],
'data': ['a','a','b','c','a','c','a','c']})
# group the data using multiindex and convert it to dictionary
d = defaultdict(dict)
for multiindex, group in df.groupby(['group_number', 'data']):
d[multiindex[0]][multiindex[1]] = group.data.size
# iterate groups twice to compare every group 
# with every other group
relationships = []
for key, val in d.items():
for k, v in d.items():
if key != k:
# get the references to two compared groups
current_row_rel = {}
current_row_rel['Source_group'] = key
current_row_rel['Target_group'] = k
# this is important, but at this point 
# you are basically comparing intersection of two 
# simple python lists
current_row_rel['Frequency'] = len(set(val).intersection(v))
relationships.append(current_row_rel)
# convert the result to pandas DataFrame for further analysis.
df = pd.DataFrame(relationships)

我确信这可以在不需要转换为字典列表的情况下完成。然而,我发现这个解决方案更简单。

最新更新