如何用R中另一列的文本解码一列

  • 本文关键字:一列 文本 解码 何用 r
  • 更新时间 :
  • 英文 :


我有一个数据帧,在答案列中有编码的调查答案,在字符列中有一个字符串:

df <- data.frame(answer = c(1, 2, 1, 3, 1),
key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
"1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
"1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"))
print(df)
answer                                            key
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"

如何使用关键字列中的数据对答案列进行解码,以便获得此结果?

df_result <- data.frame(answer = c(1, 2, 1, 3, 1),
key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
"1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
"1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"),
answer_decoded = c("Answer One", "Answer DEF", "Answer abc", "Answer Three","Answer ABC"))
print(df_result)
answer                                            key answer_decoded
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"     "Answer One"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer DEF"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"     "Answer abc"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"   "Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer ABC"

我无法使用因子标签,因为我有太多不同的项目,无法手动创建它们。

我们可以根据"answer"值提取子字符串-使用str_c创建要提取的模式,即粘贴带有空格的"answer",后面跟着=和一个或多个非数字字符(\D+(,并删除前缀部分,包括=和带有trimws的任何空格

library(stringr)
library(dplyr)
df %>%
mutate(answer_decoded = trimws(str_extract(key, 
str_c(answer, ' = \D+')), whitespace = ".*=\s+|\s+"))

-输出

answer                                            key answer_decoded
1      1 1 = Answer One 2 = Answer Two 3 = Answer Three     Answer One
2      2   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer DEF
3      1   1 = Answer abc 2 = Answer def 3 = Answer ghi     Answer abc
4      3 1 = Answer One 2 = Answer Two 3 = Answer Three   Answer Three
5      1   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer ABC

N =位上的每个字符串strsplit,然后选择第n个字符串[(由于拆分的工作方式为+1(:

mapply(`[`, strsplit(df$key, "(\s*)\d = "), df$answer + 1)
#[1] "Answer One"   "Answer DEF"   "Answer abc"   "Answer Three" "Answer ABC"  

最新更新