创建一个新列,用于标识一系列列的行为



我有以下数据帧(请参阅下面的excel文件链接(:

account_id  contract_id date_activated  term_months 2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 2009-06-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0   1234    A   2009-07-01  24  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
1   1212    B   2019-06-25  24  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
2   1111    C   2014-03-13  24  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
3   11112   FF  2017-02-09  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
4   5454    FAS 2015-08-04  36  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
5   48468   DFAF    2010-06-10  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
6   89795   SDFDF   2017-09-19  24  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
7   12454545    FADS    2017-06-26  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
8   12454545    FDAGDG  2018-06-01  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
9   12454545    ADGADGFAD   2019-01-28  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
10  12454545    ADGADGASDGADSG  2020-01-24  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
11  12454545    ADD 2020-03-02  11  0   0   0   0   0   0   ... 620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  0.00    0.00
12  12454545    ADFGG   2021-02-24  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    687.94  687.94
13  1646468 ASDADGAD    2019-10-14  36  0   0   0   0   0   0   ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14  5454555 ADGA    2018-04-02  30  0   0   0   0   0   0   ... 528.000000  528.000000  528.000000  528.000000  0.000000    0.000000    0.000000    0.000000    0.00    0.00
15  48654   GHDG    2018-10-18  36  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
16  4546486 DGHDG   2009-01-01  12  323 323 323 323 323 323 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
17  4546486 DFGHGDHDGH  2009-05-07  12  0   0   0   0   399 399 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
18  4546486 DFGAA   2009-09-10  12  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
19  4546486 SGFHJJ  2010-09-08  36  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
20  4546486 SFGHJR  2013-09-06  36  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
21  4546486 HTUIJR  2015-10-27  36  0   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00

我想要如图所示的结果(带有新列renew_type(:

account_id  contract_id date_activated  term_months renewal_type    2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0   1234    A   2009-07-01  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
1   1212    B   2019-06-25  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
2   1111    C   2014-03-13  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
3   11112   FF  2017-02-09  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
4   5454    FAS 2015-08-04  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
5   48468   DFAF    2010-06-10  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
6   89795   SDFDF   2017-09-19  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
7   12454545    FADS    2017-06-26  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
8   12454545    FDAGDG  2018-06-01  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
9   12454545    ADGADGFAD   2019-01-28  12  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
10  12454545    ADGADGASDGADSG  2020-01-24  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
11  12454545    ADD 2020-03-02  11  Early   0   0   0   0   0   ... 620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  0.00    0.00
12  12454545    ADFGG   2021-02-24  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    687.94  687.94
13  1646468 ASDADGAD    2019-10-14  36  Regular 0   0   0   0   0   ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14  5454555 ADGA    2018-04-02  30  Regular 0   0   0   0   0   ... 528.000000  528.000000  528.000000  528.000000  0.000000    0.000000    0.000000    0.000000    0.00    0.00
15  48654   GHDG    2018-10-18  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
16  4546486 DGHDG   2009-01-01  12  Regular 323 323 323 323 323 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
17  4546486 DFGHGDHDGH  2009-05-07  12  Early   0   0   0   0   399 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
18  4546486 DFGAA   2009-09-10  12  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
19  4546486 SGFHJJ  2010-09-08  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
20  4546486 SFGHJR  2013-09-06  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
21  4546486 HTUIJR  2015-10-27  36  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00

每个帐户都有一个或多个由contract_id表示的合同。每份合同也有自己的月期(term_months(。

更新类型应该是";常规的";或";早";。合同被视为";早期的";当其上一份合同尚未到期,或尚未结束其期限时(根据term_months,从激活之日起付款,此后每月根据条款数量付款(,并且在过去四个月内仍有付款(每月根据以日期为标题的列显示付款(。合同被视为";常规的";当它是第一份合同时,前一份合同在过去四个月内没有付款,或者前一份协议已经结束。

尝试使用此代码进行此操作,但存在一些问题,因为它将一些";早期的";作为";常规的";(对于renewal_type,请注意,此代码还包含另一列contract_type的循环(:

def get_types(monthly_payments):
def f(s):
check = monthly_payments.loc[
(s.date_activated.year == monthly_payments.index.year) &
(s.date_activated.month == monthly_payments.index.month)
].iloc[0]
if check.wb == 0:
# If rolling sum of 4 months prior is 0
s['contract_type'] = 'Winback'
elif check.og_upg == 0:
# If Prior Month is 0
s['contract_type'] = 'Original'
elif check.max_pmt > check.og_upg:
# If Prior Month is not missing and current month is more
s['contract_type'] = 'Upgrade'
else:
s['contract_type'] = 'Renewal'
if check.early:
# If Early
s['renewal_type'] = 'Early'
else:
s['renewal_type'] = 'Regular'
return s
return f
def apply_types(g):
# Get Non Payment Info
account_info = g[g.columns[:4]]
# Transpose Monthly Payments To Rows
monthly_payments = g.loc[:, g.columns[4:]].T
# Make Sure Index is DT
monthly_payments.index = pd.to_datetime(monthly_payments.index)
# Get Check for is early based on number of payments
monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
# Max Payment In Month
monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
# 1 Month Prior
monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
# Rolling Sum of 4 Months Prior
monthly_payments['wb'] = monthly_payments.max_pmt 
.rolling(min_periods=0, window=4).sum().shift()
# Concat New Columns With Original Payment Information
return pd.concat((
account_info.apply(get_types(monthly_payments), axis=1),
g[g.columns[4:]]
), axis=1)
df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)

我无法包含字典,因为它太长了,下面是excel文件的链接:https://drive.google.com/file/d/16BLoSugMaDdB8Qac2ATJRBLRvx3HCIus/view?usp=sharing

这可以用字典来记住hisotry。或者,按账户/开始日期对数据进行排序,然后使用shift(1(实现类似的功能

import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
try:
acct=row['account_id']
result='Early'
if acct not in d: #first contract
result='Regular'
else:
prev=d[acct]
dt=row['date_activated'].replace(day=1)
if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
result='Regular'
elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
result='Regular'
d[acct]=row.copy()
except:
print('ERROR',row)
result='ERROR'
return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']

最新更新