在不覆盖实际数据的情况下,每月向pandas数据帧插入新行



我有一个像这样的DataFrame:

data = {'ID':["AB2343","CD4565"],
'StartDate':["01/01/2020","01/01/2021"],
'EndDate':["31/12/2021","31/12/2022"],
'AgreedMonthlyBilling':[1500,4500],
'ActualMonthlyBilling':[1200,3000]}
df = pd.DataFrame(data)

ID、StartDate、EndDate和AgreedMonthlyBilling在我插入各自的值后不会更改(即AgreedMonthlyBilling中的1500和4500将永远保留(最后一个值(在excel表中手动修改的ActualMonthlyBolling(每月都会更改(即1月有1200和3000,2月有0和1500,3月有3000和1000(。我想做的是在每个新的月份添加一行,直到结束日期(这意味着,我们在2022年3月1日,我从2022年起只有3个月的时间添加(,并且只有在新的月份到来时,而不是之前(这是我陷入困境的地方(。我有以下内容:

repeats = (pd.to_datetime(df['EndDate']) - pd.to_datetime(df['StartDate'])) // np.timedelta64(1, 'M') + 1
periods = np.concatenate([pd.period_range(start=pd.to_datetime(d), periods=r, freq='M')
for d, r in zip(df['StartDate'], repeats)])

new_df = (pd.DataFrame(
np.repeat(df.values, repeats, 0),
columns=df.columns,
index=periods)
.assign(Month = [x.month for x in periods],
Year = [x.year for x in periods]))

我得到的是一个新的DataFrame,在StartDate和EndDate之间有完整的月份,但每次我更改ActualMonthlyBilling时,我都会覆盖ActualMonthlyBilling中过去的数据,并且我不会保留我想要跟踪的旧记录。

这就是我得到的

预期计费>
ID>实际计费
AB234301202015001350覆盖
AB234302202015001350覆盖
AB234303202015001350覆盖

更新假设您有以下当前数据帧:

ID  Month  Year  ExpectedBilling  RealBilling
0  AB2343      1  2022             1500         1290
1  AB2343      2  2022             1500         1350
2  CD4565      1  2022             4500         3000
3  CD4565      2  2022             4500         3250

您想根据ID:为2022年3月添加一个新行

new_df = df.groupby('ID', as_index=False) 
.agg({'ID': 'first', 'ExpectedBilling': 'last'}) 
.assign(Month=today.month, Year=today.year)
out = pd.concat([df, new_df]).sort_values(['ID', 'Month', 'Year'], ignore_index=True)
print(out)
# Output
ID  Month  Year  ExpectedBilling  RealBilling
0  AB2343      1  2022             1500       1290.0
1  AB2343      2  2022             1500       1350.0
2  AB2343      3  2022             1500          NaN
3  CD4565      1  2022             4500       3000.0
4  CD4565      2  2022             4500       3250.0
5  CD4565      3  2022             4500          NaN

问题是如何为RealBilling列插入新值?


旧答案

today = pd.Timestamp.today()
period = df.apply(lambda x: pd.period_range(x['StartDate'], today, freq='M'), axis=1)
new_df = df.assign(Period=period).explode('Period').set_index('Period') 
.assign(Month=lambda x: x.index.month, Year=lambda x: x.index.year)

输出:

ID   StartDate     EndDate  AgreedMonthlyPrice  ActualMonthlyPrice  Month  Year
Period                                                                                      
2020-01  AB2343  01/01/2020  31/12/2022                1500                1290      1  2020
2020-02  AB2343  01/01/2020  31/12/2022                1500                1290      2  2020
2020-03  AB2343  01/01/2020  31/12/2022                1500                1290      3  2020
2020-04  AB2343  01/01/2020  31/12/2022                1500                1290      4  2020
2020-05  AB2343  01/01/2020  31/12/2022                1500                1290      5  2020
2020-06  AB2343  01/01/2020  31/12/2022                1500                1290      6  2020
2020-07  AB2343  01/01/2020  31/12/2022                1500                1290      7  2020
2020-08  AB2343  01/01/2020  31/12/2022                1500                1290      8  2020
2020-09  AB2343  01/01/2020  31/12/2022                1500                1290      9  2020
2020-10  AB2343  01/01/2020  31/12/2022                1500                1290     10  2020
2020-11  AB2343  01/01/2020  31/12/2022                1500                1290     11  2020
2020-12  AB2343  01/01/2020  31/12/2022                1500                1290     12  2020
2021-01  AB2343  01/01/2020  31/12/2022                1500                1290      1  2021
2021-02  AB2343  01/01/2020  31/12/2022                1500                1290      2  2021
2021-03  AB2343  01/01/2020  31/12/2022                1500                1290      3  2021
2021-04  AB2343  01/01/2020  31/12/2022                1500                1290      4  2021
2021-05  AB2343  01/01/2020  31/12/2022                1500                1290      5  2021
2021-06  AB2343  01/01/2020  31/12/2022                1500                1290      6  2021
2021-07  AB2343  01/01/2020  31/12/2022                1500                1290      7  2021
2021-08  AB2343  01/01/2020  31/12/2022                1500                1290      8  2021
2021-09  AB2343  01/01/2020  31/12/2022                1500                1290      9  2021
2021-10  AB2343  01/01/2020  31/12/2022                1500                1290     10  2021
2021-11  AB2343  01/01/2020  31/12/2022                1500                1290     11  2021
2021-12  AB2343  01/01/2020  31/12/2022                1500                1290     12  2021
2022-01  AB2343  01/01/2020  31/12/2022                1500                1290      1  2022
2022-02  AB2343  01/01/2020  31/12/2022                1500                1290      2  2022
2022-03  AB2343  01/01/2020  31/12/2022                1500                1290      3  2022
2021-01  CD4565  01/01/2021  31/12/2025                4500                3000      1  2021
2021-02  CD4565  01/01/2021  31/12/2025                4500                3000      2  2021
2021-03  CD4565  01/01/2021  31/12/2025                4500                3000      3  2021
2021-04  CD4565  01/01/2021  31/12/2025                4500                3000      4  2021
2021-05  CD4565  01/01/2021  31/12/2025                4500                3000      5  2021
2021-06  CD4565  01/01/2021  31/12/2025                4500                3000      6  2021
2021-07  CD4565  01/01/2021  31/12/2025                4500                3000      7  2021
2021-08  CD4565  01/01/2021  31/12/2025                4500                3000      8  2021
2021-09  CD4565  01/01/2021  31/12/2025                4500                3000      9  2021
2021-10  CD4565  01/01/2021  31/12/2025                4500                3000     10  2021
2021-11  CD4565  01/01/2021  31/12/2025                4500                3000     11  2021
2021-12  CD4565  01/01/2021  31/12/2025                4500                3000     12  2021
2022-01  CD4565  01/01/2021  31/12/2025                4500                3000      1  2022
2022-02  CD4565  01/01/2021  31/12/2025                4500                3000      2  2022
2022-03  CD4565  01/01/2021  31/12/2025                4500                3000      3  2022

相关内容

最新更新