为sankey图准备数据帧



我有一个数据帧

Source         Target        valuecount
clusterAMarch  clusterAApril    10
clusterAMarch  clusterBApril    1
clusterAMarch  clusterCApril    15
clusterBMarch  clusterAApril    7
clusterBMarch  clusterBApril    11
clusterBMarch  clusterCApril    12
clusterCMarch  clusterAApril    11
clusterCMarch  clusterBApril    5
clusterCMarch  clusterCApril    15

我想使用这个数据帧来生成一个sankey图,所以想法是将源列和目标列转换为数字,如下所示:

Source Target valuecount
0     4    10
0     5    1
0     6    15
1     4    7
1     5    11
1     6    12
2     4    11
2     5    5
2     6    15

使用pd.factorize:

MAP = pd.Series(*pd.factorize(df[['Source', 'Target']]
.melt()['value'].drop_duplicates()))
df['Source'] = df['Source'].map(MAP)
df['Target'] = df['Target'].map(MAP)
print(df)
# Output
Source  Target  valuecount
0       0       3          10
1       0       4           1
2       0       5          15
3       1       3           7
4       1       4          11
5       1       5          12
6       2       3          11
7       2       4           5
8       2       5          15

最新更新