如何根据规则在客户和客服代理之间划分段落



我有一段话记录了客户和客户服务代理之间的对话。我如何将对话分开并创建两个列表(或字典等任何其他格式(,其中一个只包含客户的文本,另一个只包括代理商的文本?

示例段落:
代理名称:你好!我叫X。今天我能帮你什么忙?(4米46秒(客户:我叫Y。这是我的问题(4米57秒(代理商名称:这是解决方案(5米40秒(代理商姓名:你在吗?(6米30秒(顾客:是的,我还在这里。我还是不明白。。。(6米40秒(代理人姓名:好的。让我们换一种方式。(600米50秒(代理人姓名:这能解决问题吗?(700米40秒(代理商名称:感谢您联系客服。

预期输出:
仅包含代理人文本的列表:[‘代理人姓名:你好!我叫X。今天我能帮你什么?(4m46s(’,‘代理人名称:你在吗?(6m30s

仅包含客户文本的列表:["客户:我叫Y。这是我的问题(4米57秒(","客户:是的,我还在这里。我仍然不明白……(6米40秒("]。

谢谢!

给定:

txt='''
Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) Customer: My name is Y. Here is my issue ( 4m 57s ) Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service.'''

您可以使用re.findall:

s1='Agent Name:'
s2='Customer:'
>>> re.findall(rf'({s1}.*?(?={s2}|Z))', txt)
['Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) ', "Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) ", "Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service."]
>>> re.findall(rf'({s2}.*?(?={s1}|Z))', txt)
['Customer: My name is Y. Here is my issue ( 4m 57s ) ', "Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) "]

最新更新