Pandas:根据类别列,用最接近的值填充NaN



让我们以一个示例数据框架为例:

df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
                   "Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
                   "Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})
          Date Animal  Quantity
0   2022-10-01    Cat       NaN
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       NaN
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       NaN
9   2022-10-04    Dog       NaN
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

我想用以下方法填充Quantity列中的NaN值:

  • 将NaN值替换为之前与最接近的值在Animal
  • 中,NaN值与它们共享相同的值
  • 如果仍然有一些NaN值,用之后最接近的的值替换剩余的NaN值
  • Animal列中Nan值与具有相同值

我想Series.interpolate,但我不知道如何处理Animal列。你知道一个达到预期产出的有效方法吗?

期望输出:

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         5
9   2022-10-04    Dog         5
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1
``

每个组可以使用ffill/bfill:

df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                    .apply(lambda s: s.bfill().ffill())
                 )

输出:

          Date Animal  Quantity
0   2022-10-01    Cat       4.0
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       1.0
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       5.0
9   2022-10-04    Dog       5.0
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

添加tp @mozway sol(因为我无法编辑它)。我明白了列应该被类型转换为int

df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                    .apply(lambda s: s.bfill().ffill())
                 )
df['Quantity'] = df['Quantity'].astype('int')
print(df)
输出#

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         2
9   2022-10-04    Dog         2
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1

最新更新