我有一个DataFrame,它看起来像这样:
item_id facility_id actual_quantity min_quantity timestamp
0 11 1185 5 3 2022-01-11 00:00:00
1 12 1185 7 4 2022-01-11 00:00:00
2 23 1185 5 5 2022-01-11 00:00:00
3 34 1185 4 6 2022-01-11 00:00:00
4 56 1185 3 7 2022-01-11 00:00:00
5 67 1185 5 8 2022-01-11 00:00:00
6 12 1185 3 3 2022-01-11 01:00:00
7 23 1185 5 4 2022-01-11 01:00:00
8 34 1185 13 5 2022-01-11 01:00:00
9 45 1185 12 6 2022-01-11 01:00:00
df = pd.DataFrame({'item_id': [11, 12, 23, 34, 56, 67, 12, 23, 34, 45],
'facility_id': [1185]*10,
'actual_quantity': [5, 7, 5, 4, 3, 5, 3, 5, 13, 12],
'min_quantity': [3, 4, 5, 6, 7, 8, 3, 4, 5, 6],
'timestamp': [pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 00:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00'),
pd.Timestamp('2022-01-11 01:00:00')]})
这里,在timestamp
列中,我们有两个唯一的时间戳。对于timestamp1 = Timestamp('2022-01-11 00:00:00')
,我们有6个唯一的item_id和与这些item_id相关的其他列,而对于timestamp2 = Timestamp('2022-01-11 01:00:00')
,我们只有4个唯一的item_id。
我想实现以下目标:
在timestamp2
中添加缺少item_id
s的行。对于添加的行,保持facility_id
s相同,即1185
、actual_quantity
应为0
,min_quantity
应与该item_id
的timestamp1
相同。
对timestamp2
中的item_id
执行类似操作,但不在timestamp1
中执行。
output_df
应该是这样的:
item_id facility_id actual_quantity min_quantity timestamp
0 11 1185 5 3 2022-01-11 00:00:00
1 12 1185 7 4 2022-01-11 00:00:00
2 23 1185 5 5 2022-01-11 00:00:00
3 34 1185 4 6 2022-01-11 00:00:00
4 56 1185 3 7 2022-01-11 00:00:00
5 67 1185 5 8 2022-01-11 00:00:00
6 45 1185 0 6 2022-01-11 00:00:00
7 12 1185 3 3 2022-01-11 01:00:00
8 23 1185 5 4 2022-01-11 01:00:00
9 34 1185 13 5 2022-01-11 01:00:00
10 45 1185 12 6 2022-01-11 01:00:00
11 11 1185 0 6 2022-01-11 01:00:00
12 56 1185 0 7 2022-01-11 01:00:00
13 67 1185 0 8 2022-01-11 01:00:00
实现这一目标的最具蟒蛇风格的方法是什么?
您可以使用pivot
+stack
+fillna
+reset_index
:
tmp = df.pivot(['item_id', 'facility_id'], ['timestamp'], ['min_quantity', 'actual_quantity'])
tmp['actual_quantity'] = tmp['actual_quantity'].fillna(0)
tmp['min_quantity'] = tmp['min_quantity'].ffill(axis=1).bfill(axis=1)
out = tmp.stack(level=1).reset_index()[['item_id', 'facility_id', 'actual_quantity', 'min_quantity','timestamp']]
输出:
item_id facility_id actual_quantity min_quantity timestamp
0 11 1185 5.0 3.0 2022-01-11 00:00:00
1 11 1185 0.0 3.0 2022-01-11 01:00:00
2 12 1185 7.0 4.0 2022-01-11 00:00:00
3 12 1185 3.0 3.0 2022-01-11 01:00:00
4 23 1185 5.0 5.0 2022-01-11 00:00:00
5 23 1185 5.0 4.0 2022-01-11 01:00:00
6 34 1185 4.0 6.0 2022-01-11 00:00:00
7 34 1185 13.0 5.0 2022-01-11 01:00:00
8 45 1185 0.0 6.0 2022-01-11 00:00:00
9 45 1185 12.0 6.0 2022-01-11 01:00:00
10 56 1185 3.0 7.0 2022-01-11 00:00:00
11 56 1185 0.0 7.0 2022-01-11 01:00:00
12 67 1185 5.0 8.0 2022-01-11 00:00:00
13 67 1185 0.0 8.0 2022-01-11 01:00:00