输入:
df1 = pd.DataFrame([[101, 'DC1', 'QA,DEMO'],
[101, 'EM5', 'QA,DEMO'],
[102, 'RA6', 'PA,QA,GF'],
[103, 'DC1', 'AB,LK'],
[103, 'RA6', 'OO'],
[103, 'PA4', 'AB,OO']
]
, columns=['Call_id', 'Agent_id', 'Task_code'])
df2 = pd.DataFrame([['QA', 'Enter phone number'],
['DEMO', 'ASK EMAIL ID'],
['PA', 'make notes'],
['GF', 'take call back'],
['AB', 'Apply bell mark'],
['LK', 'call Low markets'],
['OO','Out of order']
]
, columns=['Task_code', 'Task_Desc'])
输出:
df = pd.DataFrame([[101, 'DC1', 'QA,DEMO','Enter phone number and ASK EMAIL ID'],
[101, 'EM5', 'QA,DEMO','Enter phone number and ASK EMAIL ID'],
[102, 'RA6', 'PA,QA,GF','make notes and Enter phone number and take call back'],
[103, 'DC1', 'AB,LK','Apply bell mark and call Low markets'],
[103, 'RA6', 'OO','Out of order'],
[103, 'PA4', 'AB,OO','Apply bell mark and Out of order']
]
, columns=['Call_id', 'Agent_id', 'Task_code','Task_desc'])
我想在Python 3.6 中为每个代理id和call_id在df1上将TASK_desc与"and"合并
task_mapper = dict(zip(df2["Task_code"],df2["Task_Desc"]))
def get_description(values):
values = values.strip().split(",")
values = list(filter(None,values))
descriptions = [task_mapper.get(value,"") for value in values]
return " and ".join(descriptions)
df1["Task_desc"] = df1["Task_code"].apply(get_description)
print(df1) # your output
您可以创建字典,用于按,
映射拆分的值,然后按and
分隔符连接值:
d = df2.set_index('Task_code')['Task_Desc']
f = lambda x: ' and '.join(d.get(y) for y in x.split(',') if y in d)
df1['Task_Desc'] = df1['Task_code'].apply(f)
print (df1)
Call_id Agent_id Task_code
0 101 DC1 QA,DEMO
1 101 EM5 QA,DEMO
2 102 RA6 PA,QA,GF
3 103 DC1 AB,LK
4 103 RA6 OO
5 103 PA4 AB,OO
Task_Desc
0 Enter phone number and ASK EMAIL ID
1 Enter phone number and ASK EMAIL ID
2 make notes and Enter phone number and take cal...
3 Apply bell mark and call Low markets
4 Out of order
5 Apply bell mark and Out of order
或者us panda函数用于分解列、合并和聚合具有不同分隔符的join
:
f1 = lambda x: ' and '.join(x.unique())
f2 = lambda x: ','.join(x.unique())
df = (df1.assign(Task_code = df1['Task_code'].str.split(','))
.explode('Task_code')
.merge(df2, on='Task_code')
.groupby(['Call_id','Agent_id'])
.agg({'Task_code':f2, 'Task_Desc':f1})
.reset_index())
print (df)
Call_id Agent_id Task_code
0 101 DC1 QA,DEMO
1 101 EM5 QA,DEMO
2 102 RA6 QA,PA,GF
3 103 DC1 AB,LK
4 103 PA4 AB,OO
5 103 RA6 OO
Task_Desc
0 Enter phone number and ASK EMAIL ID
1 Enter phone number and ASK EMAIL ID
2 Enter phone number and make notes and take cal...
3 Apply bell mark and call Low markets
4 Apply bell mark and Out of order
5 Out of order
这是我的方法:拆分为列表
df1['desc'] = df1.Task_code.str.split(',')
制作字典以查找
dictionary = dict(zip(df2.Task_code,df2.Task_Desc))
def descdic(l):
return ' and '.join([dictionary[i] for i in l])
应用于数据
df1.desc = df1.desc.apply(lambda x: descdic(x))