假设我有一个具有这种结构的数据集
pet_name doggo floofer puppo pupper
A None floofer None None
B doggo None None None
C None None puppo None
D None None None pupper
E doggo floofer None None
F None None puppo pupper
G None None None None
我想要一个名为dog_stage的新列,其中包含变量(dogo、floofer、puppo、pupper(
最终的结果会像一样
name dog_stage
A floofer
B doggo
C puppo
D pupper
E doggo, floofer
F puppo, pupper
G None
并放下柱
对于这两种解决方案,只过滤必要的列:
df = df[['name','doggo' , 'floofer', 'puppo', 'pupper']].copy()
第一个解决方案连接列名(如果不包含类似Nonetype的None
或类似字符串None
和DataFrame.dot
(,用于列名的矩阵乘法:
#convert pet_name to index, if possible strings None replace and test not NaNs or not Nones
df1 = df.set_index('name').replace('None', np.nan).notna()
df1 = df1.dot(df1.columns + ',').str[:-1].reset_index(name='dog_stage')
print (df1)
name dog_stage
0 A floofer
1 B doggo
2 C puppo
3 D pupper
4 E doggo,floofer
5 F puppo,pupper
6 G
另一个想法是,如果不是lambda函数中的None
,则连接每一行:
df1 = (df.set_index('name')
.replace('None', np.nan)
.apply(lambda x: ','.join(x.dropna()), axis=1)
.reset_index(name='dog_stage'))