使用pandas-diff()保留第一个值



我得到了如下所示的数据帧:

Note: Datetime is the index
Name   target_mtd
Datetime 
2021-12-01 Amy     1000
2021-12-02 Amy     2500
2021-12-03 Amy     4000
2021-12-01 Bobo    2000
2021-12-02 Bobo    3000
2021-12-03 Bobo    4000

我想将列target_mtd转换为每组中的每日值,因此我执行以下代码:

df['target_daily'] = df.groupby([df.index.month, 'Name'])['target_mtd'].transform(lambda x:x.diff())

并给出了与我预期的不一样的结果:

Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         NaN
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         NaN
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

预期结果是将保留第一个值:

Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         1000
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         2000
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

谢谢!

您可以通过Series.fillna:替换原始列中缺失的值

df['target_daily'] = (df.groupby([df.index.month, 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))

如果有多个年份是必要的,则使用月份来区分年份和月份:

df['target_daily'] = (df.groupby([df.index.to_period('m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))

或者每月使用Grouper(年+月也单独计算(:

df['target_daily'] = (df.groupby([pd.Grouper(freq='m'), 'Name'])['target_mtd']
.diff()
.fillna(df['target_mtd']))

相关内容

最新更新