我有以下数据帧(请参阅下面的excel文件链接(:
account_id contract_id date_activated term_months 2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 2009-06-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0 1234 A 2009-07-01 24 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
1 1212 B 2019-06-25 24 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
2 1111 C 2014-03-13 24 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
3 11112 FF 2017-02-09 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
4 5454 FAS 2015-08-04 36 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
5 48468 DFAF 2010-06-10 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
6 89795 SDFDF 2017-09-19 24 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
7 12454545 FADS 2017-06-26 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
8 12454545 FDAGDG 2018-06-01 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
9 12454545 ADGADGFAD 2019-01-28 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
10 12454545 ADGADGASDGADSG 2020-01-24 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
11 12454545 ADD 2020-03-02 11 0 0 0 0 0 0 ... 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 0.00 0.00
12 12454545 ADFGG 2021-02-24 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 687.94 687.94
13 1646468 ASDADGAD 2019-10-14 36 0 0 0 0 0 0 ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14 5454555 ADGA 2018-04-02 30 0 0 0 0 0 0 ... 528.000000 528.000000 528.000000 528.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
15 48654 GHDG 2018-10-18 36 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
16 4546486 DGHDG 2009-01-01 12 323 323 323 323 323 323 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
17 4546486 DFGHGDHDGH 2009-05-07 12 0 0 0 0 399 399 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
18 4546486 DFGAA 2009-09-10 12 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
19 4546486 SGFHJJ 2010-09-08 36 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
20 4546486 SFGHJR 2013-09-06 36 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
21 4546486 HTUIJR 2015-10-27 36 0 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
我想要如图所示的结果(带有新列renew_type(:
account_id contract_id date_activated term_months renewal_type 2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0 1234 A 2009-07-01 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
1 1212 B 2019-06-25 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
2 1111 C 2014-03-13 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
3 11112 FF 2017-02-09 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
4 5454 FAS 2015-08-04 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
5 48468 DFAF 2010-06-10 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
6 89795 SDFDF 2017-09-19 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
7 12454545 FADS 2017-06-26 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
8 12454545 FDAGDG 2018-06-01 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
9 12454545 ADGADGFAD 2019-01-28 12 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
10 12454545 ADGADGASDGADSG 2020-01-24 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
11 12454545 ADD 2020-03-02 11 Early 0 0 0 0 0 ... 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 0.00 0.00
12 12454545 ADFGG 2021-02-24 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 687.94 687.94
13 1646468 ASDADGAD 2019-10-14 36 Regular 0 0 0 0 0 ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14 5454555 ADGA 2018-04-02 30 Regular 0 0 0 0 0 ... 528.000000 528.000000 528.000000 528.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
15 48654 GHDG 2018-10-18 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
16 4546486 DGHDG 2009-01-01 12 Regular 323 323 323 323 323 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
17 4546486 DFGHGDHDGH 2009-05-07 12 Early 0 0 0 0 399 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
18 4546486 DFGAA 2009-09-10 12 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
19 4546486 SGFHJJ 2010-09-08 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
20 4546486 SFGHJR 2013-09-06 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
21 4546486 HTUIJR 2015-10-27 36 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
每个帐户都有一个或多个由contract_id表示的合同。每份合同也有自己的月期(term_months(。
更新类型应该是";常规的";或";早";。合同被视为";早期的";当其上一份合同尚未到期,或尚未结束其期限时(根据term_months,从激活之日起付款,此后每月根据条款数量付款(,并且在过去四个月内仍有付款(每月根据以日期为标题的列显示付款(。合同被视为";常规的";当它是第一份合同时,前一份合同在过去四个月内没有付款,或者前一份协议已经结束。
尝试使用此代码进行此操作,但存在一些问题,因为它将一些";早期的";作为";常规的";(对于renewal_type,请注意,此代码还包含另一列contract_type的循环(:
def get_types(monthly_payments):
def f(s):
check = monthly_payments.loc[
(s.date_activated.year == monthly_payments.index.year) &
(s.date_activated.month == monthly_payments.index.month)
].iloc[0]
if check.wb == 0:
# If rolling sum of 4 months prior is 0
s['contract_type'] = 'Winback'
elif check.og_upg == 0:
# If Prior Month is 0
s['contract_type'] = 'Original'
elif check.max_pmt > check.og_upg:
# If Prior Month is not missing and current month is more
s['contract_type'] = 'Upgrade'
else:
s['contract_type'] = 'Renewal'
if check.early:
# If Early
s['renewal_type'] = 'Early'
else:
s['renewal_type'] = 'Regular'
return s
return f
def apply_types(g):
# Get Non Payment Info
account_info = g[g.columns[:4]]
# Transpose Monthly Payments To Rows
monthly_payments = g.loc[:, g.columns[4:]].T
# Make Sure Index is DT
monthly_payments.index = pd.to_datetime(monthly_payments.index)
# Get Check for is early based on number of payments
monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
# Max Payment In Month
monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
# 1 Month Prior
monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
# Rolling Sum of 4 Months Prior
monthly_payments['wb'] = monthly_payments.max_pmt
.rolling(min_periods=0, window=4).sum().shift()
# Concat New Columns With Original Payment Information
return pd.concat((
account_info.apply(get_types(monthly_payments), axis=1),
g[g.columns[4:]]
), axis=1)
df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)
我无法包含字典,因为它太长了,下面是excel文件的链接:https://drive.google.com/file/d/16BLoSugMaDdB8Qac2ATJRBLRvx3HCIus/view?usp=sharing
这可以用字典来记住hisotry。或者,按账户/开始日期对数据进行排序,然后使用shift(1(实现类似的功能
import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
try:
acct=row['account_id']
result='Early'
if acct not in d: #first contract
result='Regular'
else:
prev=d[acct]
dt=row['date_activated'].replace(day=1)
if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
result='Regular'
elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
result='Regular'
d[acct]=row.copy()
except:
print('ERROR',row)
result='ERROR'
return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']