为了创建网络,我很难根据相关数据(可能是groupby?(找到组之间的连接。
对于每个组,如果它们具有相同的元素,则它们是连接的。
例如,我的数据帧如下所示:
group_number data
1 a
2 a
2 b
2 c
2 a
3 c
4 a
4 c
所以输出将是
Source_group Target_group Frequency
2 1 1 (because a-a)
3 2 1 (because c-c)
4 2 2 (because a-a, c-c)
当然(because...)
不会出现在输出中,只是解释
非常感谢
我考虑过你的问题。你可以做如下操作:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'group_number': [1,2,2,2,2,3,4,4],
'data': ['a','a','b','c','a','c','a','c']})
# group the data using multiindex and convert it to dictionary
d = defaultdict(dict)
for multiindex, group in df.groupby(['group_number', 'data']):
d[multiindex[0]][multiindex[1]] = group.data.size
# iterate groups twice to compare every group
# with every other group
relationships = []
for key, val in d.items():
for k, v in d.items():
if key != k:
# get the references to two compared groups
current_row_rel = {}
current_row_rel['Source_group'] = key
current_row_rel['Target_group'] = k
# this is important, but at this point
# you are basically comparing intersection of two
# simple python lists
current_row_rel['Frequency'] = len(set(val).intersection(v))
relationships.append(current_row_rel)
# convert the result to pandas DataFrame for further analysis.
df = pd.DataFrame(relationships)
我确信这可以在不需要转换为字典列表的情况下完成。然而,我发现这个解决方案更简单。