Pandas数据框架,检查列表在列中,设置值在不同列中



我需要你的帮助完成以下任务:我有以下数据框架:

test = {'Col1':[2,5],
'Col2':[5,7],
'Col_List':[['One','Two','Three','Four','Five'], ['Two', 'Four']],
'One':[0,0],
'Two':[0,0],
'Three':[0,0],
'Four':[0,0],
'Five':[0,0],}
df=pd.DataFrame.from_dict(test)
df

看起来像:

<表类>Col1Col2Col_List两个3四5tbody><<tr>25(一,二,三,四,五)0000057(2、4)00000
exploded = df.explode("Col_List")
df.update(pd.get_dummies(exploded["Col_List"])
.mul(exploded["Col1"], axis="rows")
.groupby(level=0).sum())
  • 爆炸列表的元素到他们自己的行
  • get 1-hot表示"One", "Two"等。
  • 将其与(已爆炸的)"值
    • 1/0值将作为选择器
  • 然后撤销爆炸:groupby &和
  • 最后更新原始帧的"One", "Two"…包含
  • 的列

,

>>> df
Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]    2    2      2     2     2
1     5     7                    [Two, Four]    0    5      0     5     0

让我们尝试打开数据,打开堆栈,然后再赋值:

s = (df[['Col1', 'Col_List']].explode('Col_List')
.set_index('Col_List', append=True)['Col1']
.unstack().fillna(df)
)
df.loc[s.index, s.columns] = s

输出:

Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]  2.0  2.0    2.0   2.0   2.0
1     5     7                    [Two, Four]  0.0  5.0    0.0   5.0   0.0

另一种可能的解决方案,基于pandas.crosstab:

aux = df.explode('Col_List')
df.update(pd.crosstab(aux.index, aux['Col_List'], aux['Col1'],
aggfunc=lambda x: x))

或者,使用numpy broadcasting:

colnames = df.loc[:, 'One':'Five'].columns
df[colnames] = (df['Col1'].values[:, None] * np.vstack([np.any(colnames.values ==
np.array(x)[:, None], axis=0) for x in df['Col_List']]))
输出:

Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]    2    2      2     2     2
1     5     7                    [Two, Four]    0    5      0     5     0

您可以使用.str.join()str.get_dummies()

df.assign(**df['Col_List'].str.join('|').str.get_dummies().mul(df['Col1'],axis=0))

输出:

Col1  Col2                       Col_List  One  Two  Three  Four  Five
0     2     5  [One, Two, Three, Four, Five]    2    2      2     2     2
1     5     7                    [Two, Four]    0    5      0     5     0

最新更新