我有一个熊猫数据帧,如下所示:
客户ID | PROD_ID | 数量价格数量 | 价格 | |||||
---|---|---|---|---|---|---|---|---|
31-12-2020 | ||||||||
123 | abc | 10 | 5 | <10>5.4 | 116||||
123 | efg | 50 | 53 | 50|||||
456 | abc | 10 | 5 | >10 | 5.4<10>6||||
456 | efg | 10 | 53 | 10 | 53 | 11 | 53 |
首先,这里有一个干净的数据帧版本:
df = pd.DataFrame({('123', 'abc'): [10, 5, 10, 5.4, 11, 6],
('123', 'efg'): [50, 53, 50, 53, 100, 53],
('456', 'abc'): [10, 5.9, 10, 5.4, 10, 6],
('456', 'efg'): [10, 53, 10, 53, 11, 53]},
index=pd.MultiIndex.from_product([pd.to_datetime(['31-12-2020', '01-01-2021', '02-01-2021']),
['Quantity', 'Price']]).swaplevel()
).T
数据:
Quantity Price Quantity Price Quantity Price
2020-12-31 2020-12-31 2021-01-01 2021-01-01 2021-02-01 2021-02-01
123 abc 10.0 5.0 10.0 5.4 11.0 6.0
efg 50.0 53.0 50.0 53.0 100.0 53.0
456 abc 10.0 5.9 10.0 5.4 10.0 6.0
efg 10.0 53.0 10.0 53.0 11.0 53.0
然后你可以移动你的列并计算差异:
(df-df.shift(2, axis=1)).rename(mapper=lambda x: f'{x}_diff', axis='columns', level=0).dropna(axis=1)
输出:
Quantity_diff Price_diff Quantity_diff Price_diff
2021-01-01 2021-01-01 2021-02-01 2021-02-01
123 abc 0.0 0.4 1.0 0.6
efg 0.0 0.0 50.0 0.0
456 abc 0.0 -0.5 0.0 0.6
efg 0.0 0.0 1.0 0.0
组合数据帧:
pd.concat([df,
(df-df.shift(2, axis=1)).rename(mapper=lambda x: f'{x}_diff', axis='columns', level=0).dropna(axis=1)
], axis=1).sort_index(level=[1,0], ascending=[True, False], axis=1)
输出:
Quantity Price Quantity_diff Quantity Price_diff Price Quantity_diff Quantity Price_diff Price
2020-12-31 2020-12-31 2021-01-01 2021-01-01 2021-01-01 2021-01-01 2021-02-01 2021-02-01 2021-02-01 2021-02-01
123 abc 10.0 5.0 0.0 10.0 0.4 5.4 1.0 11.0 0.6 6.0
efg 50.0 53.0 0.0 50.0 0.0 53.0 50.0 100.0 0.0 53.0
456 abc 10.0 5.9 0.0 10.0 -0.5 5.4 0.0 10.0 0.6 6.0
efg 10.0 53.0 0.0 10.0 0.0 53.0 1.0 11.0 0.0 53.0