我有一个数据帧
EGID Start_date End_Date Consumption Number_day
1 2019-01-01 2019-05-28 30 152
1 2019-06-05 2019-07-31 40 60
1 2019-08-01 2019-09-31 40 0
1 2019-02-11 2019-04-14 60 60
2 2019-02-11 2019-04-14 60 60
2 2019-08-01 2019-09-31 40 200
3 2019-01-01 2019-05-28 30 200
3 2019-06-05 2019-07-31 40 150
3 2019-08-01 2019-09-31 40 60
3 2019-09-31 2019-12-31 50 60
预期输出:
EGID Start_date End_Date Consumption Number_day
1 2019-01-01 2019-07-31 70 212
1 2019-08-01 2019-09-31 40 0
1 2019-02-11 2019-04-14 60 60
2 2019-02-11 2019-09-31 100 260
3 2019-01-01 2019-07-31 70 350
3 2019-08-01 2019-12-31 90 120
如果是相同的EGID(字符串(,我想将消耗量和天数相加。然而,如果天数行大于350或等于0,那么它应该停止求和(即使它是相同的EGID——请参见EGID 1(。然后,如果number_days之和的总和大于350,则它应该停止求和(参见EGID 3(。最后,行的end_date应该是最后一个求和的日期。
有人帮了我,我们这样做了,但不幸的是,天数的总和可能超过350天,因为它用相同的EGID汇总了所有内容。
df['Consumption2'] = (df['Number_day'] < 350) * df['Consumption']
df['Number_day2'] = (df['Number_day'] < 350) * df['Number_day']
df.groupby(['EGID'])[['Start_date', 'End_Date', 'Consumption2', 'Number_day2']].agg({'Start_date':'min', 'End_Date':'max', 'Consumption2':'sum', 'Number_day2':'sum'})
谢谢你的帮助。
试试这个,如果结果不是所需的,请调整当前问题,以包括那些具有所需结果的案例。
import pandas as pd #import library
# i am assuming that your dataframe name is df
df = pd.DataFrame({'EGID':['1', '1', '1', '1', '2', '2', '3', '3', '3', '3'],
'Start_date':['2019-01-01', '2019-06-05', '2019-08-01', '2019-02-11', '2019-02-11',
'2019-08-01', '2019-01-01', '2019-06-05', '2019-08-01', '2019-09-31'],
'End_Date':['2019-05-28', '2019-07-31', '2019-09-31', '2019-04-14', '2019-04-14',
'2019-09-31', '2019-05-28', '2019-07-31', '2019-09-31', '2019-12-31'],
'Consumption':[30, 40, 40, 60, 60, 40, 30, 40, 40, 50],
'Number_day':[152, 60, 0, 60, 60, 200, 200, 150, 60, 60]})
def counter_limits(x, lower_limit = 0, upper_limit=350):
"""
Function that takes a list or pandas series and returns a list integers that increment
when cumulative sum is reached or the element is the lower limit
"""
result = []
cummulative = 0
counter = 0
prev_el = -1
for el in x:
cummulative += el
if (cummulative > upper_limit) or (el == lower_limit) or (prev_el == lower_limit):
counter += 1
cummulative = 0
result.append(counter)
else:
result.append(counter)
prev_el = el
return result
df['key'] = df.groupby(['EGID'])['Number_day'].transform(counter_limits)
df2 = pd.concat([
df.groupby(['EGID', 'key'])[['Start_date', 'End_Date']].last(),
df.groupby(['EGID', 'key'])[['Consumption', 'Number_day']].sum()
], axis=1).reset_index().drop(columns='key')
df2