根据另一列中的可用性,为不同列创建具有不同规则的附加行



我有一个DataFrame,它看起来像这样:

item_id  facility_id  actual_quantity  min_quantity           timestamp
0       11         1185                5             3 2022-01-11 00:00:00
1       12         1185                7             4 2022-01-11 00:00:00
2       23         1185                5             5 2022-01-11 00:00:00
3       34         1185                4             6 2022-01-11 00:00:00
4       56         1185                3             7 2022-01-11 00:00:00
5       67         1185                5             8 2022-01-11 00:00:00
6       12         1185                3             3 2022-01-11 01:00:00
7       23         1185                5             4 2022-01-11 01:00:00
8       34         1185               13             5 2022-01-11 01:00:00
9       45         1185               12             6 2022-01-11 01:00:00
df = pd.DataFrame({'item_id': [11, 12, 23, 34, 56, 67, 12, 23, 34, 45],
'facility_id': [1185]*10,
'actual_quantity': [5, 7, 5, 4, 3, 5, 3, 5, 13, 12],
'min_quantity': [3, 4, 5, 6, 7, 8, 3, 4, 5, 6],
'timestamp': [pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00')]})

这里,在timestamp列中,我们有两个唯一的时间戳。对于timestamp1 = Timestamp('2022-01-11 00:00:00'),我们有6个唯一的item_id和与这些item_id相关的其他列,而对于timestamp2 = Timestamp('2022-01-11 01:00:00'),我们只有4个唯一的item_id。

我想实现以下目标:

timestamp2中添加缺少item_ids的行。对于添加的行,保持facility_ids相同,即1185actual_quantity应为0min_quantity应与该item_idtimestamp1相同。

timestamp2中的item_id执行类似操作,但不在timestamp1中执行。

output_df应该是这样的:

item_id  facility_id  actual_quantity  min_quantity           timestamp
0        11         1185                5             3 2022-01-11 00:00:00
1        12         1185                7             4 2022-01-11 00:00:00
2        23         1185                5             5 2022-01-11 00:00:00
3        34         1185                4             6 2022-01-11 00:00:00
4        56         1185                3             7 2022-01-11 00:00:00
5        67         1185                5             8 2022-01-11 00:00:00
6        45         1185                0             6 2022-01-11 00:00:00
7        12         1185                3             3 2022-01-11 01:00:00
8        23         1185                5             4 2022-01-11 01:00:00
9        34         1185               13             5 2022-01-11 01:00:00
10       45         1185               12             6 2022-01-11 01:00:00
11       11         1185                0             6 2022-01-11 01:00:00
12       56         1185                0             7 2022-01-11 01:00:00
13       67         1185                0             8 2022-01-11 01:00:00

实现这一目标的最具蟒蛇风格的方法是什么?

您可以使用pivot+stack+fillna+reset_index:

tmp = df.pivot(['item_id', 'facility_id'], ['timestamp'], ['min_quantity', 'actual_quantity'])
tmp['actual_quantity'] = tmp['actual_quantity'].fillna(0)
tmp['min_quantity'] = tmp['min_quantity'].ffill(axis=1).bfill(axis=1)
out = tmp.stack(level=1).reset_index()[['item_id', 'facility_id', 'actual_quantity', 'min_quantity','timestamp']]

输出:

item_id  facility_id  actual_quantity  min_quantity           timestamp
0        11         1185              5.0           3.0 2022-01-11 00:00:00
1        11         1185              0.0           3.0 2022-01-11 01:00:00
2        12         1185              7.0           4.0 2022-01-11 00:00:00
3        12         1185              3.0           3.0 2022-01-11 01:00:00
4        23         1185              5.0           5.0 2022-01-11 00:00:00
5        23         1185              5.0           4.0 2022-01-11 01:00:00
6        34         1185              4.0           6.0 2022-01-11 00:00:00
7        34         1185             13.0           5.0 2022-01-11 01:00:00
8        45         1185              0.0           6.0 2022-01-11 00:00:00
9        45         1185             12.0           6.0 2022-01-11 01:00:00
10       56         1185              3.0           7.0 2022-01-11 00:00:00
11       56         1185              0.0           7.0 2022-01-11 01:00:00
12       67         1185              5.0           8.0 2022-01-11 00:00:00
13       67         1185              0.0           8.0 2022-01-11 01:00:00

最新更新