如何编写一个函数来查找已经消失的、不断变化的、新的客户等



我正试图想出一种动态方法来检查字符串的存在,并报告一些不同的结果:gone_client、boomerang、new_client。

  1. 如果我按address_id和my_date分组,并且模式是Verizon、Verizon、Comcast、Comcast,则客户端离开Verizon,转到另一家公司。

  2. 如果客户从Verizon转到康卡斯特,然后又回到Verizon,这就是一个回旋镖。

  3. 如果该客户上个月是Verizon的新客户,那么这就是一个新客户,但如果该客户在过去的某个时候是Verizon的,而上个月又是新客户,则这是一个回旋镖。

    进口熊猫作为pd将numpy导入为np

    # data stored in dictionary
    details = {
    'address_id': [111,111,111,111,111,111,222,222,222,222,222,222,333,333,333,333,333,333,444,444,444,444,444,444,555,555,555,555,555,555,777,777,777],
    'my_company':['Comcast','Verizon','Other','Other','Comcast','Comcast','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Verizon','Verizon','Verizon','Verizon','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Other','Verizon','Comcast','Comcast'],
    'my_date':['2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28']
    }
    df = pd.DataFrame(details)
    df
    

我不能很好地理解逻辑,但我认为它是沿着这些线的。。。但似乎有点不对劲。

def f(ser):
if 'Verizon' not in ser.unique():
return False
if 'Verizon' in ser.unique():
return True
return False
df['gone_client'] = df.groupby('address_id')['my_company'].transform(f)
df.head()

也许是这样的。。。

# substring to be searched
sub ='Verizon'

# creating and passing series to new column
df["Indexes"]= df["my_company"].str.find(sub)
df

不过,我不知道如何处理这些日期,只有检查这些日期,你才能知道一个客户是否已经走了,是否是新客户,或者可能是一些不适合放在桶里的奇怪模式,比如"其他"。预期的结果是这样的(最后一种情况有点像回旋镖,又一次离开了(。我预计不会有很多这样的情况,但肯定会有一些非常独特的情况,所以"其他"对这些记录来说很好。

address_id      my_company  my_date     status
0   111             Comcast     1/24/2022   Left
1   111             Verizon     2/21/2022   Left
2   111             Other       3/28/2022   Left
3   111             Other       4/25/2022   Left
4   111             Comcast     5/23/2022   Left
5   111             Comcast     6/27/2022   Left
6   222             Spectrum    1/24/2022   Never Client
7   222             Spectrum    2/21/2022   Never Client
8   222             Spectrum    3/28/2022   Never Client
9   222             Spectrum    4/25/2022   Never Client
10  222             Spectrum    5/23/2022   Never Client
11  222             Spectrum    6/27/2022   Never Client
12  333             Verizon     1/24/2022   Never Left
13  333             Verizon     2/21/2022   Never Left
14  333             Verizon     3/28/2022   Never Left
15  333             Verizon     4/25/2022   Never Left
16  333             Verizon     5/23/2022   Never Left
17  333             Verizon     6/27/2022   Never Left
18  444             Spectrum    1/24/2022   Left
19  444             Spectrum    2/21/2022   Left
20  444             Spectrum    3/28/2022   Left
21  444             Spectrum    4/25/2022   Left
22  444             Verizon     5/23/2022   Left
23  444             Spectrum    6/27/2022   Left
24  555             Verizon     1/24/2022   Boomerang
25  555             Spectrum    2/21/2022   Boomerang
26  555             Spectrum    3/28/2022   Boomerang
27  555             Spectrum    4/25/2022   Boomerang
28  555             Verizon     5/23/2022   Boomerang
29  555             Verizon     6/27/2022   Boomerang
30  777             Comcast     1/24/2022   New Client
31  777             Comcast     2/21/2022   New Client
32  777             Verizon     1/24/2022   New Client
33  888             Verizon     2/21/2022   Other
34  888             Comcast     3/28/2022   Other
35  888             Verizon     4/25/2022   Other
36  888             Comcast     5/23/2022   Other
37  888             Comcast     6/27/2022   Other

这里有一个包含一些详细逻辑的解决方案,您可以使用它。听起来你不太确定自己的最终逻辑,但希望这能给你足够的机会。

该解决方案基于address_id对数据帧进行分组。然后,对于每个单独的组,我们可以检查哪个my_company特征。我们可以使用这个,再加上timedelta,来确定地址是否在Verizon,而不是Verizon,从未在Verizon,目前在Verizon,并在过去30天内回来,等等。

此答案不是由Verizon赞助的。存在其他蜂窝式提供商


import pandas as pd
import datetime
# data stored in dictionary
details = {
'address_id': [111,111,111,111,111,111,222,222,222,222,222,222,333,333,333,333,333,333,444,444,444,444,444,444,555,555,555,555,555,555,777,777,777],
'my_company':['Comcast','Verizon','Other','Other','Comcast','Comcast','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Verizon','Verizon','Verizon','Verizon','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Other','Verizon','Comcast','Comcast'],
'my_date':['2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28']
}
df = pd.DataFrame(details)
df['my_date'] = pd.to_datetime(df['my_date']) 
address_groups = df.groupby(['address_id'])
frame_list = []
current_date = datetime.datetime.now()
for group, frame in address_groups:
# Create a list and set of each company used by a given address-id:
company_list = frame['my_company'].values.tolist()
company_set = set(company_list)
# Exclusively Verizon
if ('Verizon' in company_set) and (len(company_set) == 1):
frame['status'] = 'Verizon Diehard'
# Never Verizon
if ('Verizon' not in company_set):
frame['status'] = 'Verizon Never'
# Verizon at some point but not currently
if ('Verizon' in company_set) and (company_list[-1] != 'Verizon'):
v_frame = frame[frame['my_company'] == 'Verizon']
last_verizon_date = v_frame['my_date'].iloc[-1]
last_verizon_date = datetime.datetime.strptime(last_verizon_date, '%Y-%m-%d')
if (current_date - last_verizon_date) < pd.Timedelta("30 days"):
frame['status'] = 'Not curretly Verizon, but was in last 30 days'
else:
frame['status'] = 'Not curretly Verizon, but was so more than 30 days ago'

# Verizon currently but was a boomerang
if (company_list[-1] == 'Verizon') and (len(company_set) >= 2):
non_v_frame = frame[frame['my_company'] != 'Verizon']
last_non_v_date = non_v_frame['my_date'].iloc[-1]
last_non_v_date = datetime.datetime.strptime(last_non_v_date, '%Y-%m-%d')
if (current_date - last_non_v_date) < pd.Timedelta("30 days"):
frame['status'] = 'Boomerang back to Verizon in last 30 days'
else:
frame['status'] = 'Boomerang back more than 30 days ago'

frame_list.append(frame)
final_df = pd.concat(frame_list)

最新更新