在特定日期,从0开始的度量值会增加一个值。给定一组不连续的日期和值,是否可以生成具有度量的列?
输入-每天的度量变化
date value
02-03-2022 00:00:00 10
03-03-2022 00:00:00 0
06-03-2022 00:00:00 2
10-03-2022 00:00:00 18
输出-为连续天数范围计算的度量(起始值=0,除非第一天已经发生更改(
0 metric
0 2022-02-28 0
1 2022-03-01 0
2 2022-03-02 10
3 2022-03-03 10
4 2022-03-04 10
5 2022-03-05 10
6 2022-03-06 12
7 2022-03-07 12
8 2022-03-08 12
9 2022-03-09 12
10 2022-03-10 30
11 2022-03-11 30
12 2022-03-12 30
13 2022-03-13 30
代码示例
import pandas as pd
df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
'03-03-2022 00:00:00',
'06-03-2022 00:00:00',
'10-03-2022 00:00:00'],
'value': [10, 0, 2, 18]},
index=[0,1,2,3])
df2 = pd.DataFrame(pd.date_range(start='28-02-2022', end='13-03-2022'))
df2['metric'] = 0 # TODO
按日期替换df中df2中的值,用0填充缺失值,然后用cumsum
:
df['date'] = pd.to_datetime(df.date, format='%d-%m-%Y %H:%M:%S')
df2['metric'] = df2[0].map(df.set_index('date')['value']).fillna(0).cumsum()
df2
0 metric
0 2022-02-28 0.0
1 2022-03-01 0.0
2 2022-03-02 10.0
3 2022-03-03 10.0
4 2022-03-04 10.0
5 2022-03-05 10.0
6 2022-03-06 12.0
7 2022-03-07 12.0
8 2022-03-08 12.0
9 2022-03-09 12.0
10 2022-03-10 30.0
11 2022-03-11 30.0
12 2022-03-12 30.0
13 2022-03-13 30.0
df.reindex
对此很有用。然后添加df.fillna
并应用df.cumsum
。
import pandas as pd
df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
'03-03-2022 00:00:00',
'06-03-2022 00:00:00',
'10-03-2022 00:00:00'],
'value': [10, 0, 2, 18]},
index=[0,1,2,3])
df['date'] = pd.to_datetime(df.date, format='%d-%m-%Y %H:%M:%S')
res = df.set_index('date').reindex(pd.date_range(
start='2022-02-28', end='2022-03-13')).fillna(0).cumsum()
.reset_index(drop=False).rename(columns={'index':'date',
'value':'metric'})
print(res)
date metric
0 2022-02-28 0.0
1 2022-03-01 0.0
2 2022-03-02 10.0
3 2022-03-03 10.0
4 2022-03-04 10.0
5 2022-03-05 10.0
6 2022-03-06 12.0
7 2022-03-07 12.0
8 2022-03-08 12.0
9 2022-03-09 12.0
10 2022-03-10 30.0
11 2022-03-11 30.0
12 2022-03-12 30.0
13 2022-03-13 30.0