在某些条件下与上面的行组合

  • 本文关键字:组合 条件下 python pandas
  • 更新时间 :
  • 英文 :


表格格式(空单元格为空,列为:字段、维度)

field | dimension
-----------------
a     | 
b     | abc
e     | efg
      | xyz
r     | abc
      | def
      | xyz

所需格式:

field | dimension
-----------------
a     | [nan]
b     | [abc]
e     | [efg, xyz]
r     | [abc, def, xyz]

我试过了:

df.dimension = [df.dimension]

并且打算在字段中找到每个空单元格的索引,并与上面的行合并。然而,我得到了——

错误:值的长度与索引的长度不匹配。

我也认为一定有比我接近它的方式更好的方法。提前致谢

使用:

df =(df.groupby(df['field'].ffill())['dimension']
       .apply(lambda x: np.nan if x.isnull().all() else list(x))
       .reset_index())
print (df)
  field        dimension
0     a              NaN
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

df = (df[df['dimension'].notnull()].groupby(df['field'].ffill())['dimension']
                                  .apply(list)
                                  .reindex(pd.unique(df['field'].dropna()))
                                  .reset_index())
print (df)
  field        dimension
0     a              NaN
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

但如果在列表中NaN没问题:

df =(df.groupby(df['field'].ffill())['dimension']
       .apply(list)
       .reset_index())
print (df)
  field        dimension
0     a            [nan]
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

让我们试试:

df['field'] = df['field'].ffill()
df_out = df.groupby('field')['dimension'].apply(list).reset_index()

输出:

  field        dimension
0     a            [nan]
1     b            [abc]
2     e       [efg, xyz]
3     r  [abc, def, xyz]

最新更新