让我们以一个示例数据框架为例:
df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
"Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
"Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})
Date Animal Quantity
0 2022-10-01 Cat NaN
1 2022-10-02 Cat 4.0
2 2022-10-03 Cat 3.0
3 2022-10-04 Cat 5.0
4 2022-10-05 Cat 1.0
5 2022-10-06 Cat NaN
6 2022-10-01 Dog 6.0
7 2022-10-02 Dog 5.0
8 2022-10-03 Dog NaN
9 2022-10-04 Dog NaN
10 2022-10-05 Dog 2.0
11 2022-10-06 Dog 1.0
我想用以下方法填充Quantity
列中的NaN值:
- 将NaN值替换为之前与最接近的值在
Animal
列 中,NaN值与它们共享相同的值 - 如果仍然有一些NaN值,用之后最接近的的值替换剩余的NaN值
Animal
列中Nan值与具有相同值我想Series.interpolate,但我不知道如何处理Animal
列。你知道一个达到预期产出的有效方法吗?
期望输出:
Date Animal Quantity
0 2022-10-01 Cat 4
1 2022-10-02 Cat 4
2 2022-10-03 Cat 3
3 2022-10-04 Cat 5
4 2022-10-05 Cat 1
5 2022-10-06 Cat 1
6 2022-10-01 Dog 6
7 2022-10-02 Dog 5
8 2022-10-03 Dog 5
9 2022-10-04 Dog 5
10 2022-10-05 Dog 2
11 2022-10-06 Dog 1
``
每个组可以使用ffill
/bfill
:
df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
.apply(lambda s: s.bfill().ffill())
)
输出: Date Animal Quantity
0 2022-10-01 Cat 4.0
1 2022-10-02 Cat 4.0
2 2022-10-03 Cat 3.0
3 2022-10-04 Cat 5.0
4 2022-10-05 Cat 1.0
5 2022-10-06 Cat 1.0
6 2022-10-01 Dog 6.0
7 2022-10-02 Dog 5.0
8 2022-10-03 Dog 5.0
9 2022-10-04 Dog 5.0
10 2022-10-05 Dog 2.0
11 2022-10-06 Dog 1.0
添加tp @mozway
sol(因为我无法编辑它)。我明白了列应该被类型转换为int
df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
.apply(lambda s: s.bfill().ffill())
)
df['Quantity'] = df['Quantity'].astype('int')
print(df)
输出# Date Animal Quantity
0 2022-10-01 Cat 4
1 2022-10-02 Cat 4
2 2022-10-03 Cat 3
3 2022-10-04 Cat 5
4 2022-10-05 Cat 1
5 2022-10-06 Cat 1
6 2022-10-01 Dog 6
7 2022-10-02 Dog 5
8 2022-10-03 Dog 2
9 2022-10-04 Dog 2
10 2022-10-05 Dog 2
11 2022-10-06 Dog 1