我想创建一个引用日期列- 1年并显示相应值的新列:
import pandas as pd
Input DF
df = pd.DataFrame({'consumption': [0,1,3,5], 'date':[pd.to_datetime('2017-04-01'),
pd.to_datetime('2017-04-02'),
pd.to_datetime('2018-04-01'),
pd.to_datetime('2018-04-02')]})
>>> df
consumption date
0 2017-04-01
1 2017-04-02
3 2018-04-01
5 2018-04-02
Expected DF
df = pd.DataFrame({'consumption': [0,1,3,5],
'prev_year_consumption': [np.NAN,np.NAN,0,1],
'date':[pd.to_datetime('2017-04-01'),
pd.to_datetime('2017-04-02'),
pd.to_datetime('2018-04-01'),
pd.to_datetime('2018-04-02')]})
>>> df
consumption prev_year_consumption date
0 NAN 2017-04-01
1 NAN 2017-04-02
3 0 2018-04-01
5 1 2018-04-02
所以prev_year_consumption是消费列中的简单值,其中1年从日期中动态减去。
SQL中的我可能会这样写
SELECT df_past.consumption as prev_year_consumption, df_current.consumption
FROM df as df_current
LEFT JOIN ON df df_past ON year(df_current.date) = year(df_past.date) - 1
感谢您的提示
pandas
中的表示法类似。我们仍然在做自我merge
,但是我们需要指定right_on
(或left_on
)的DateOffset
为1年:
new_df = df.merge(
df,
left_on='date',
right_on=df['date'] + pd.offsets.DateOffset(years=1),
how='left'
)
new_df
:
date consumption_x date_x consumption_y date_y
0 2017-04-01 0 2017-04-01 NaN NaT
1 2017-04-02 1 2017-04-02 NaN NaT
2 2018-04-01 3 2018-04-01 0.0 2017-04-01
3 2018-04-02 5 2018-04-02 1.0 2017-04-02
我们可以进一步drop
和rename
列来得到精确的输出:
new_df = df.merge(
df,
left_on='date',
right_on=df['date'] + pd.offsets.DateOffset(years=1),
how='left'
).drop(columns=['date_x', 'date_y']).rename(columns={
'consumption_y': 'prev_year_consumption'
})
new_df
:
date consumption_x prev_year_consumption
0 2017-04-01 0 NaN
1 2017-04-02 1 NaN
2 2018-04-01 3 0.0
3 2018-04-02 5 1.0