表格格式(空单元格为空,列为:字段、维度)
field | dimension
-----------------
a |
b | abc
e | efg
| xyz
r | abc
| def
| xyz
所需格式:
field | dimension
-----------------
a | [nan]
b | [abc]
e | [efg, xyz]
r | [abc, def, xyz]
我试过了:
df.dimension = [df.dimension]
并且打算在字段中找到每个空单元格的索引,并与上面的行合并。然而,我得到了——
值错误:值的长度与索引的长度不匹配。
我也认为一定有比我接近它的方式更好的方法。提前致谢
使用:
df =(df.groupby(df['field'].ffill())['dimension']
.apply(lambda x: np.nan if x.isnull().all() else list(x))
.reset_index())
print (df)
field dimension
0 a NaN
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
df = (df[df['dimension'].notnull()].groupby(df['field'].ffill())['dimension']
.apply(list)
.reindex(pd.unique(df['field'].dropna()))
.reset_index())
print (df)
field dimension
0 a NaN
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
但如果在列表中NaN
没问题:
df =(df.groupby(df['field'].ffill())['dimension']
.apply(list)
.reset_index())
print (df)
field dimension
0 a [nan]
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
让我们试试:
df['field'] = df['field'].ffill()
df_out = df.groupby('field')['dimension'].apply(list).reset_index()
输出:
field dimension
0 a [nan]
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]