如何在R 中将"调用原因"one_answers"解决方案"键值对从列中提取到两个单独的列中
> dput(head(asummy, 10))
structure(list(`arrange_df[, 12]` = c("Customer Name: JOErnReason for the Call: BP Set UprnResolution: rnvisual audit on the accountrnset expectations in BPrnoffered call backrnCenter Location: ClarkrnCTN (Number Calling About): ************************************rnAVAYA (Number Calling From): ************************************rnAgent UID: cm***************************urn",
"Name: Kim PutokrnCtn: ************************************rnReason for calling: lost phone/ ************************************/ follow up on insurance claim/ routed to asurionrnResolution:rnLocation: Clarkrnattuid: gb***************************rrn",
"Customer Name: Heather rnReason for the Call: got suspended / card issue / supposedly paid / complains she just updated online her card then got susprnResolution: rnCenter Location: ClarkrnCTN: ************************************rnAlt No: ************************************rnCredit Reason: ********* courtesy rnAgent UID: rc***************************w/I-EQR******************Grn",
"Customer Name: GLORIA ; CTN: ************************************ Affected CTN: ************************************ Alternate Number: none Reason for the Call: change rate plan Resolution: set exp on changing rate plan Agent UID: gt***************************g",
"Customer Name: BRANDY rnReason for the Call: paymentrnRecommendations/Troubleshooting Steps:rnResolution: explained card errorsrnasked for alt # but no goodrnoffered to use different card or refill cardrnsent qp linkrnshe will go to bank to check as per cxrnCenter Location:rnCTN: ************************************rnCredit Reason (If credit was applied to account):rnAgent UID: jq***************************nrn",
"Customer Name: ; ;Leahi ; Reason for the Call: ; ; ;billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions ; Resolution: ; ;explained bill / ctn change info provided ; Center Location: ;Clark CTN: ; ;************************************ Alt No: Credit Reason: Agent UID: rc***************************w/I-K*********GMDR",
"Customer Name: MICHAELrnReason for the Call: ACCOUNT BAL INQrnrnRecommendations/Troubleshooting Steps:rn*Account verified and provided informationrnrnResolution: rn*calling about the account statusrn*suspended due to BBPrn*adv about BP policyrn*Provided payment optionsrnrnCenter Location:ClarkrnCTN************************************:rnCredit Reason (If credit was applied to account):rnAgent UID:rc*********rn",
"Customer Name: KimberlyrnCTN: ************************************rnAffected CTN:************************************rnAlternate Number: nonernReason for the Call: add mhs otc $******************rnResolution: added mhs otc $******************rnAgent UID: gt***************************grn",
"auto pop / suspended linern-cx speaking spanishrn-transfered callrn",
"Customer Name: TERRELL ; Reason for the Call: payment Recommendations/Troubleshooting Steps: Resolution: payment success test and validated CTN: ************************************ Credit Reason (If credit was applied to account): Agent UID: jq***************************n"
)), row.names = c(NA, 10L), class = "data.frame")
我会将其标记为看似恶意的数据:有几个不一致之处降低了正确解析的信心。在花了一些时间之后,我正在尽我所能。最终,你应该找到这个问题的根源,要么(a(问他们为什么恨你,要么(b(修复抓取脚本,让它对这些事情不那么天真。(也许这两者都不可能,但我想我会给你一些想法。(一些问题的例子:
-
许多行似乎由
rn
:分隔Customer Name: JOErnReason for the Call: BP Set Uprn...
部分采用CCD_ 2
Customer Name: GLORIA ; CTN: ...
但是一些相同的
;
定界的行没有其他;
Customer Name: GLORIA ; CTN: ************************************ Affected CTN: ...
-
有些似乎毫无理由地有多个分号,
Customer Name: ; ;Leahi ; Reason for the Call: ; ; ;billing issue ...
-
其中一个看起来是孤立的(没有键(,也许它是从前一行的字段连接起来的?
auto pop / suspended linern-cx speaking spanishrn-transfered callrn
-
在
CTN
和其修订值之间的一行上似乎缺少冒号...rnCenter Location:ClarkrnCTN************************************:rn
fun <- function(z) {
z <- trimws(z)
gre <- gregexpr("\b([A-Z][^:]+:)", z)
lhs <- lapply(regmatches(z, gre), trimws)
rhs <- lapply(regmatches(z, gre, invert = TRUE), trimws)
rhs <- lapply(rhs, function(R) R[if (!nzchar(R[1])) -1 else TRUE])
z <- sapply(Map(paste, lhs, rhs), paste, collapse = "rn")
if (!grepl(":", z)) z <- paste("UNK:", z)
with(list(r = strsplit(z, "[rn;]+")[[1]]),
sapply(split(r, cumsum(grepl(":", r))), paste, collapse = "rn"))
}
library(dplyr)
library(tidyr) # unnest, separate
dat %>%
transmute(row = row_number(), L = lapply(`arrange_df[, 12]`, fun)) %>%
unnest(L) %>%
separate(L, sep = ":", into = c("lhs", "rhs"), fill = "right") %>%
mutate(across(c(lhs, rhs), trimws)) %>%
as.data.frame() # purely so you can see all of the data here, not required
# row lhs rhs
# 1 1 Customer Name JOE
# 2 1 Reason for the Call BP Set Up
# 3 1 Resolution visual audit on the accountrnset expectations inrnBPrnoffered call back
# 4 1 Center Location Clark
# 5 1 CTN (Number Calling About) ************************************
# 6 1 AVAYA (Number Calling From) ************************************
# 7 1 Agent UID cm***************************u
# 8 2 Name Kim Putok
# 9 2 Ctn ************************************
# 10 2 Reason for calling lost phone/ ************************************/ follow up on insurance claim/ routed to asurion
# 11 2 Resolution
# 12 2 Location Clark
# 13 2 attuid gb***************************r
# 14 3 Customer Name Heather
# 15 3 Reason for the Call got suspended / card issue / supposedly paid / complains she just updated online her card then got susp
# 16 3 Resolution
# 17 3 Center Location Clark
# 18 3 CTN ************************************
# 19 3 Alt No ************************************
# 20 3 Credit Reason ********* courtesy
# 21 3 Agent UID rc***************************w/I-EQR******************G
# 22 4 Customer Name GLORIA
# 23 4 CTN ************************************
# 24 4 Affected CTN ************************************
# 25 4 Alternate Number none
# 26 4 Reason for the Call change rate plan
# 27 4 Resolution set exp on changing rate plan
# 28 4 Agent UID gt***************************g
# 29 5 Customer Name BRANDY
# 30 5 Reason for the Call payment
# 31 5 Recommendations/Troubleshooting Steps
# 32 5 Resolution explained card errorsrnasked for alt # but no goodrnoffered to use different card or refill cardrnsent qp linkrnshe will go to bank to check as per cx
# 33 5 Center Location
# 34 5 CTN ************************************
# 35 5 Credit Reason (If credit was applied to account)
# 36 5 Agent UID jq***************************n
# 37 6 Customer Name Leahi
# 38 6 Reason for the Call billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions
# 39 6 Resolution explained bill / ctn change info provided
# 40 6 Center Location
# 41 6 Clark CTN ************************************
# 42 6 Alt No
# 43 6 Credit Reason
# 44 6 Agent UID rc***************************w/I-K*********GMDR
# 45 7 Customer Name MICHAEL
# 46 7 Reason for the Call ACCOUNT BAL INQ
# 47 7 Recommendations/Troubleshooting Steps *rnAccount verified and provided information
# 48 7 Resolution *calling about the account statusrn*suspended due tornBBPrn*adv about BP policyrn*Provided payment options
# 49 7 Center Location Clark
# 50 7 CTN************************************
# 51 7 Credit Reason (If credit was applied to account)
# 52 7 Agent UID rc*********
# 53 8 Customer Name Kimberly
# 54 8 CTN ************************************
# 55 8 Affected CTN ************************************
# 56 8 Alternate Number none
# 57 8 Reason for the Call add mhs otc $******************
# 58 8 Resolution added mhs otc $******************
# 59 8 Agent UID gt***************************g
# 60 9 UNK auto pop / suspended linern-cx speaking spanishrn-transfered call
# 61 10 Customer Name TERRELL
# 62 10 Reason for the Call payment
# 63 10 Recommendations/Troubleshooting Steps
# 64 10 Resolution payment success test and validated
# 65 10 CTN ************************************
# 66 10 Credit Reason (If credit was applied to account)
# 67 10 Agent UID jq***************************n