从聊天对话中提取相关文本

我有以下聊天对话：

聊天开始时间：2018 年 4 月 2 日，星期一，10：23：30 (+0100) 聊天来源：GB - 我的帐户(已登录) 代理纳文 P ( 1s ) 纳文：感谢您联系XYZ，您正在与纳文交谈。我能帮上什么忙？ ( 34秒 ) 访客：您好，我刚刚从百联宽带切换到晶圆厂光纤 ( 39秒 ) 纳文：嗨 ( 42秒 ) 纳文：早上好。 ( 47秒 ) 访客：我发现灿烂的宽带太慢了 ( 52秒 ) 纳文：今天我能帮你什么？ ( 1 分钟 1 秒 )访客：我们被告知将是切换的一天 ( 1 分钟 5 秒 )纳文：我来帮你。 ( 1 分钟 11 秒 )访客：你能告诉我什么时候会发生这种情况吗

我希望使用R仅从上面提取相关文本.我基本上只希望访问者评论出现在结果中。

我想要的结果如下：

我发现辉煌的宽带太慢了我们被告知将是切换的一天你能告诉我这什么时候会发生吗

我试图使用 gsub 和 strsplit 完成此操作，但无济于事。感谢这里的输入。

mytext <- paste(c("Agent Navin P ( 1s ) Navin: Thanks for contacting XYZ, you are talking to Navin. How can I help? ( 34s ) Visitor:", 
"Hello , I?ve just currently switched from brillian broadband to fab fibre ( 39s ) Navin: Hi ( 42s ) Navin: Good morning. ( 47s )", 
"Visitor: I find the brilliant broadband so slow ( 52s ) Navin: How can i help you today? ( 1m 1s ) Visitor: And we got told would be", 
"a day to get switched over ( 1m 5s ) Navin: I'll help you with it. ( 1m 11s ) Visitor: Can you tell me when this will happen by"
), collapse = ' ')

保留更多信息的一种可能解决方案：

mytext <- paste(c("Agent Navin P ( 1s ) Navin: Thanks for contacting XYZ, you are talking to Navin. How can I help? ( 34s ) Visitor:", 
"Hello , I?ve just currently switched from brillian broadband to fab fibre ( 39s ) Navin: Hi ( 42s ) Navin: Good morning. ( 47s )", 
"Visitor: I find the brilliant broadband so slow ( 52s ) Navin: How can i help you today? ( 1m 1s ) Visitor: And we got told would be", 
"a day to get switched over ( 1m 5s ) Navin: I'll help you with it. ( 1m 11s ) Visitor: Can you tell me when this will happen by"
), collapse = ' ')

library(dplyr); library(textshape); library(stringi)
mytext %>%
stri_replace_all_regex('(\( [0-9ms ]+ \))(\s+)', '$1<<splithere>>') %>%
stri_split_fixed('<<splithere>>') %>%
lapply(function(x) {
x %>%
split_transcript() %>%
mutate(dialogue = ifelse(!grepl('\(\s*([0-9ms ]+)\s\)', dialogue), paste(dialogue, '( - )'), dialogue)) %>%
extract(dialogue, c('dialogue', 'timestamp'), '(^.+)\s\(\s*([0-9ms -]+)\s\)')
})
## [[1]]
##                  person                                                                  dialogue timestamp
## 1  Agent Navin P ( 1s )                                                             Agent Navin P        1s
## 2                 Navin      Thanks for contacting XYZ, you are talking to Navin. How can I help?       34s
## 3               Visitor Hello , I?ve just currently switched from brillian broadband to fab fibre       39s
## 4                 Navin                                                                        Hi       42s
## 5                 Navin                                                             Good morning.       47s
## 6               Visitor                                    I find the brilliant broadband so slow       52s
## 7                 Navin                                                 How can i help you today?     1m 1s
## 8               Visitor                       And we got told would be a day to get switched over     1m 5s
## 9                 Navin                                                    I'll help you with it.    1m 11s
## 10              Visitor                                  Can you tell me when this will happen by         -

然后，您可以按人员等进行过滤。

我相信有很多方法可以做到这一点。我拆分了访客并使用 sub 删除了 Navin 的答案。对于替换，我们需要在末尾添加[-1]，因为我们在拆分的第一个"访客"之前不需要任何东西。

str <- "Chat Started: Monday, April 02, 2018, 10:23:30 (+0100) Chat Origin: GB - My Account (Signed In) Agent Navin P ( 1s ) Navin: Thanks for contacting XYZ, you are talking to Navin. How can I help? ( 34s ) Visitor: Hello , I?ve just currently switched from brillian broadband to fab fibre ( 39s ) Navin: Hi ( 42s ) Navin: Good morning. ( 47s ) Visitor: I find the brilliant broadband so slow ( 52s ) Navin: How can i help you today? ( 1m 1s ) Visitor: And we got told would be a day to get switched over ( 1m 5s ) Navin: I'll help you with it. ( 1m 11s ) Visitor: Can you tell me when this will happen by"
str <- strsplit(str," Visitor: ")[[1]]
sub(" \((.*?)\) Navin:.*","",str)[-1]
# [1] "Hello , I?ve just currently switched from brillian broadband to fab fibre"
# [2] "I find the brilliant broadband so slow"                                   
# [3] "And we got told would be a day to get switched over"                      
# [4] "Can you tell me when this will happen by"

如果你想要像你的例子这样的一行，你可以使用paste

paste(sub(" \((.*?)\) Navin:.*","",str)[-1],collapse = " ")
# [1] "Hello , I?ve just currently switched from brillian broadband to fab fibre I find the brilliant broadband so slow And we got told would be a day to get switched over Can you tell me when this will happen by"

如果此人的姓名不一致为"Navin"，则可以使用\w+匹配sub查询中任何人的姓名以将其删除。

sub(" \((.*?)\) \w+:.*","",str)[-1]

相关内容

最新更新

热门标签：