Pandas按特定顺序填充()行



我对fillna()方法有问题。这是我的示例df,它表示商店中的商品数量。我想填充所有NaNs。如果有NaN,我想用前一天的值填充它,或者如果是NaNaNs,那么我想用0来填充它。我正在寻找最好的熊猫方式,我对循环有一些想法,但看起来不太好。

我的df:

    day    shop  product  quantity
0     1  shop_A   apples       3.0
1     2  shop_A   apples       NaN
2     3  shop_A   apples       1.0
3     1  shop_A  bananas       NaN
4     2  shop_A  bananas       NaN
5     3  shop_A  bananas       NaN
6     1  shop_B   apples       NaN
7     2  shop_B   apples       NaN
8     3  shop_B   apples       2.0
9     1  shop_B  bananas       NaN
10    2  shop_B  bananas       4.0
11    3  shop_B  bananas       2.0

预期df:

    day    shop  product  quantity
0     1  shop_A   apples       3.0
1     2  shop_A   apples       3.0
2     3  shop_A   apples       1.0
3     1  shop_A  bananas       0.0
4     2  shop_A  bananas       0.0
5     3  shop_A  bananas       0.0
6     1  shop_B   apples       2.0
7     2  shop_B   apples       2.0
8     3  shop_B   apples       2.0
9     1  shop_B  bananas       4.0
10    2  shop_B  bananas       4.0
11    3  shop_B  bananas       2.0

我也试过fillna(limit=3),但这不是我想要的。

您可以使用sort_values按天排序,然后执行分组的bfill,然后通过链接fillna(0):,剩下的只会得到0

df['quantity'] = df.sort_values(by='day')
                 .groupby(['shop','product'])['quantity'].bfill(limit=3).fillna(0)

打印回:

    day    shop  product  quantity
0     1  shop_A   apples       3.0
1     2  shop_A   apples       1.0
2     3  shop_A   apples       1.0
3     1  shop_A  bananas       0.0
4     2  shop_A  bananas       0.0
5     3  shop_A  bananas       0.0
6     1  shop_B   apples       2.0
7     2  shop_B   apples       2.0
8     3  shop_B   apples       2.0
9     1  shop_B  bananas       4.0
10    2  shop_B  bananas       4.0
11    3  shop_B  bananas       2.0

这将为每个商店和产品提供前一天的NaN值的第二天的值。您可以类似地使用ffill(或两者都使用(,也许还可以使用线性插值,结果会相应地发生变化。然而,这正是你开始工作所需要的。

最新更新