我有一个数据帧
Source Target valuecount
clusterAMarch clusterAApril 10
clusterAMarch clusterBApril 1
clusterAMarch clusterCApril 15
clusterBMarch clusterAApril 7
clusterBMarch clusterBApril 11
clusterBMarch clusterCApril 12
clusterCMarch clusterAApril 11
clusterCMarch clusterBApril 5
clusterCMarch clusterCApril 15
我想使用这个数据帧来生成一个sankey图,所以想法是将源列和目标列转换为数字,如下所示:
Source Target valuecount
0 4 10
0 5 1
0 6 15
1 4 7
1 5 11
1 6 12
2 4 11
2 5 5
2 6 15
使用pd.factorize
:
MAP = pd.Series(*pd.factorize(df[['Source', 'Target']]
.melt()['value'].drop_duplicates()))
df['Source'] = df['Source'].map(MAP)
df['Target'] = df['Target'].map(MAP)
print(df)
# Output
Source Target valuecount
0 0 3 10
1 0 4 1
2 0 5 15
3 1 3 7
4 1 4 11
5 1 5 12
6 2 3 11
7 2 4 5
8 2 5 15