我在Python中有一个数据帧,如下所示:
Name Hobbies
0 Paul ["Watch_NBA", "Play_PS4"]
1 Jeff ["Play_hockey", "Read", "Play_PS4"]
2 Kyle ["Sleep", "Watch_NBA"]
我需要在一个新列中转换列表中的每个元素,如果它出现在原始列表中,则分配值0或1。结果如下:
Name Watch_NBA Play_PS4 Play_hockey Read Sleep
0 Paul 1 1 0 0 0
1 Jeff 0 1 1 1 0
2 Kyle 1 0 0 0 1
有人知道我怎么能做到这一点。请记住,我会在专栏中使用很多Hobbies,所以它显示出一点自动化,而不是硬编码。谢谢
get_dummies()
是好的,但sklearn's
MultiLabelBinarizer
具有更好的性能:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
a = mlb.fit_transform(df["Hobbies"])
df_expanded = pd.DataFrame(a, columns=mlb.classes_, index=df.index)
# merge them using the following:
df_merged = df.merge(df_expanded, left_index=True, right_index=True)
print(df_merged)
index Name Hobbies Play_PS4 Play_hockey Read Sleep Watch_NBA
0 Paul [Watch_NBA, Play_PS4] 1 0 0 0 1
1 Jeff [Play_hockey, Read, Play_PS4] 1 1 1 0 0
2 Kyle [Sleep, Watch_NBA] 0 0 0 1 1
您需要get_dummies()
方法。此处提供文档。
例如:
names = df.Name
df = pd.get_dummies(df.Hobbies.apply(pd.Series).stack()).sum(level=0)
df.insert(0, 'Name', names)
#output:
Name Play_PS4 Play_hockey Read Sleep Watch_NBA
0 Paul 1 0 0 0 1
1 Jeff 1 1 1 0 0
2 Kyle 0 0 0 1 1
In [86]: df
Out[86]:
Name Hobbies
0 Paul [NBA, PS4]
1 Jeff [Hockey, Read, PS4]
2 Kyle [Sleep, NBA]
In [87]: df['dummy'] = 1
In [88]: df.explode("Hobbies").pivot(index='Name', columns='Hobbies', values='dummy').fillna(value=0)
Out[88]:
Hobbies Hockey NBA PS4 Read Sleep
Name
Jeff 1.0 0.0 1.0 1.0 0.0
Kyle 0.0 1.0 0.0 0.0 1.0
Paul 0.0 1.0 1.0 0.0 0.0
你可以试试这个:
n = df['Name']
df = df['Hobbies'].apply(lambda x: pd.Series([1] * len(x), index=x)).fillna(0, downcast='infer')
df.insert(0, 'Name', n)
print(df)
输出:
Name Watch_NBA Play_PS4 Play_hockey Read Sleep
0 Paul 1 1 0 0 0
1 Jeff 0 1 1 1 0
2 Kyle 1 0 0 0 1