我需要你的帮助完成以下任务:我有以下数据框架:
test = {'Col1':[2,5],
'Col2':[5,7],
'Col_List':[['One','Two','Three','Four','Five'], ['Two', 'Four']],
'One':[0,0],
'Two':[0,0],
'Three':[0,0],
'Four':[0,0],
'Five':[0,0],}
df=pd.DataFrame.from_dict(test)
df
看起来像:
<表类>Col1 Col2 Col_List 两个 3 四 5 tbody><<tr>2 5 (一,二,三,四,五) 0 0 0 0 0 57 (2、4) 0 0 0 0 0 表类>
exploded = df.explode("Col_List")
df.update(pd.get_dummies(exploded["Col_List"])
.mul(exploded["Col1"], axis="rows")
.groupby(level=0).sum())
- 爆炸列表的元素到他们自己的行
- get 1-hot表示"One", "Two"等。
- 将其与(已爆炸的)"值
- 1/0值将作为选择器
- 然后撤销爆炸:groupby &和
- 最后更新原始帧的"One", "Two"…包含 的列
,
>>> df
Col1 Col2 Col_List One Two Three Four Five
0 2 5 [One, Two, Three, Four, Five] 2 2 2 2 2
1 5 7 [Two, Four] 0 5 0 5 0
让我们尝试打开数据,打开堆栈,然后再赋值:
s = (df[['Col1', 'Col_List']].explode('Col_List')
.set_index('Col_List', append=True)['Col1']
.unstack().fillna(df)
)
df.loc[s.index, s.columns] = s
输出:
Col1 Col2 Col_List One Two Three Four Five
0 2 5 [One, Two, Three, Four, Five] 2.0 2.0 2.0 2.0 2.0
1 5 7 [Two, Four] 0.0 5.0 0.0 5.0 0.0
另一种可能的解决方案,基于pandas.crosstab
:
aux = df.explode('Col_List')
df.update(pd.crosstab(aux.index, aux['Col_List'], aux['Col1'],
aggfunc=lambda x: x))
或者,使用numpy broadcasting
:
colnames = df.loc[:, 'One':'Five'].columns
df[colnames] = (df['Col1'].values[:, None] * np.vstack([np.any(colnames.values ==
np.array(x)[:, None], axis=0) for x in df['Col_List']]))
输出:Col1 Col2 Col_List One Two Three Four Five
0 2 5 [One, Two, Three, Four, Five] 2 2 2 2 2
1 5 7 [Two, Four] 0 5 0 5 0
您可以使用.str.join()
和str.get_dummies()
df.assign(**df['Col_List'].str.join('|').str.get_dummies().mul(df['Col1'],axis=0))
输出:
Col1 Col2 Col_List One Two Three Four Five
0 2 5 [One, Two, Three, Four, Five] 2 2 2 2 2
1 5 7 [Two, Four] 0 5 0 5 0