我有一个格式的数据帧:
id amenities ...
1 "TV,Internet,Shower,..." ...
2 "TV,Hot tub,Internet,..." ...
3 "Internet,Heating,Shower..." ...
...
我想拆分关于逗号的字符串,并为每个类别创建虚拟列,结果如下:
id TV Internet Shower Hot tub Heating ...
1 1 1 1 0 0 ...
2 1 1 0 1 0 ...
3 0 1 1 0 1 ...
...
我将如何做到这一点?
谢谢
您可以将
get_dummies
与join
或concat
一起使用:
df = df[['id']].join(df['amentieis'].str.get_dummies(','))
print (df)
id Heating Hot tub Internet Shower TV
0 1 0 0 1 1 1
1 2 0 1 1 0 1
2 3 1 0 1 1 0
或:
df = pd.concat([df['id'], df['amentieis'].str.get_dummies(',')], axis=1)
print (df)
id Heating Hot tub Internet Shower TV
0 1 0 0 1 1 1
1 2 0 1 1 0 1
2 3 1 0 1 1 0