想要将User_id和技能数据框架矩阵转换为零一个数据框架矩阵格式用户及其相应的技能
输入数据帧
user_Id skills
0 user1 "java, hdfs, hadoop"
1 user2 "python, c++, c"
2 user3 "hadoop, java, hdfs"
3 user4 "html, java, php"
4 user5 "hadoop, php, hdfs"
所需的输出dataframe
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
对我来说, str.get_dummies
concat
:
df1 = df['skills'].str.get_dummies(', ')
print (df1)
c c++ hadoop hdfs html java php python
0 0 0 1 1 0 1 0 0
1 1 1 0 0 0 0 0 1
2 0 0 1 1 0 1 0 0
3 0 0 0 0 1 1 1 0
4 0 0 1 1 0 0 1 0
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
编辑:
如果没有space
使用,
使用:
df1 = df['skills'].str.get_dummies(',')