我有这个数据帧:
id class text
1 ["oil","water"] text1
2 ["oil"] text2
3 ["sun","water","earth"] text3
我有一个使用这个代码的所有可能类的列表:
import ast
df.class.map(ast.literal_eval).explode().value_counts()
oil
water
sun
earth
我想创建一个新的数据帧,将所有类作为列名,如果列名对应于类列,则设置1:
id class text oil water sun earth
1 ["oil","water"] text1 1 1 0 0
2 ["oil"] text2 1 0 0 0
3 ["sun","water","earth"] text3 0 1 1 1
我试过了:
f = df.explode('class').pivot(columns='class', index='text', values='text').notnull().astype(int)
但是列名没有正确拆分
(df.merge(df.assign(value=1).explode('class').
pivot_table('value', 'id', 'class', aggfunc = sum,fill_value = 0).
reset_index()))
id class text earth oil sun water
0 1 [oil, water] text1 0 1 0 1
1 2 [oil] text2 0 1 0 0
2 3 [sun, water, earth] text3 1 0 1 1