假设我有以下数据集
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
现在,我想找到每个月的最后一个值,将其按月移动到下一个月的值,最后取这些值的累积乘积。对上述数据执行此过程应该会得到(执行每个步骤):
查找每个月的最后一个条目并按月移动它们将产生
Date Id Value Temp 0 2015-01-01 1 0.5 NaN 1 2015-01-27 1 0.5 NaN 2 2015-01-31 1 0.5 NaN 3 2015-02-01 1 2.0 0.5 4 2015-02-03 1 2.0 0.5 5 2015-02-15 1 2.0 0.5 6 2015-02-28 1 2.0 0.5 7 2015-03-01 1 3.0 2.0 8 2015-03-17 1 3.0 2.0 9 2015-03-31 1 3.0 2.0
用
1
填充NaN
,取累积积,去掉temp
,得到Date Id Value Result 0 2015-01-01 1 0.5 1 1 2015-01-27 1 0.5 1 2 2015-01-31 1 0.5 1 3 2015-02-01 1 2.0 0.5 4 2015-02-03 1 2.0 0.5 5 2015-02-15 1 2.0 0.5 6 2015-02-28 1 2.0 0.5 7 2015-03-01 1 3.0 1.0 8 2015-03-17 1 3.0 1.0 9 2015-03-31 1 3.0 1.0
我希望这足够清楚。如果有人想知道我为什么要这么做是因为我有mtd数据,需要重新采样。谢谢,Tingis。
编辑每个月的条目数是"随机的",因为它们可以与月一样长,也可以更短(业务数据…)
下面的代码没有假设每个月只有两行。我们的想法是首先进行分组计算,然后使用.reindex()
填充一些NaN,并使用向后填充填充这些NaN,因为我们已经获得了每个月最后一个条目的值。
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-31 1 3.0
# processing
# =================================================
# get last entry from each month
df_temp = df.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df, backward fill
df['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(df.index).fillna(method='bfill')
print(df)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-31 1 3.0 1.0
后续问题:
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df1 = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
df2 = df1.copy()
df2.Id = 2
df = df1.append(df2)
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-27 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-03 1 2.0
2015-02-15 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-17 1 3.0
2015-03-31 1 3.0
2015-01-01 2 0.5
2015-01-27 2 0.5
2015-01-31 2 0.5
2015-02-01 2 2.0
2015-02-03 2 2.0
2015-02-15 2 2.0
2015-02-28 2 2.0
2015-03-01 2 3.0
2015-03-17 2 3.0
2015-03-31 2 3.0
def my_func(group):
# get last entry from each month
df_temp = group.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df
group['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(group.index).fillna(method='bfill')
return group
df.groupby('Id').apply(my_func)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-27 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-03 1 2.0 0.5
2015-02-15 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-17 1 3.0 1.0
2015-03-31 1 3.0 1.0
2015-01-01 2 0.5 1.0
2015-01-27 2 0.5 1.0
2015-01-31 2 0.5 1.0
2015-02-01 2 2.0 0.5
2015-02-03 2 2.0 0.5
2015-02-15 2 2.0 0.5
2015-02-28 2 2.0 0.5
2015-03-01 2 3.0 1.0
2015-03-17 2 3.0 1.0
2015-03-31 2 3.0 1.0