我对fillna()
方法有问题。这是我的示例df,它表示商店中的商品数量。我想填充所有NaNs。如果有NaN,我想用前一天的值填充它,或者如果是NaNaNs,那么我想用0来填充它。我正在寻找最好的熊猫方式,我对循环有一些想法,但看起来不太好。
我的df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples NaN
2 3 shop_A apples 1.0
3 1 shop_A bananas NaN
4 2 shop_A bananas NaN
5 3 shop_A bananas NaN
6 1 shop_B apples NaN
7 2 shop_B apples NaN
8 3 shop_B apples 2.0
9 1 shop_B bananas NaN
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
预期df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 3.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
我也试过fillna(limit=3)
,但这不是我想要的。
您可以使用sort_values
按天排序,然后执行分组的bfill
,然后通过链接fillna(0)
:,剩下的只会得到0
df['quantity'] = df.sort_values(by='day')
.groupby(['shop','product'])['quantity'].bfill(limit=3).fillna(0)
打印回:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 1.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
这将为每个商店和产品提供前一天的NaN
值的第二天的值。您可以类似地使用ffill
(或两者都使用(,也许还可以使用线性插值,结果会相应地发生变化。然而,这正是你开始工作所需要的。