在R中不使用正则表达式的数据帧中用字符串替换多个数字



我在数据帧中有一些列,我想用它们对应的字符串值替换整数。整数经常在单元格中重复(用空格、逗号、/或-等分隔(

> df = data.frame(c1=c(1,2,3,23,c('11,21'),c('13-23')))
> df
c1
1     1
2     2
3     3
4    23
5 11,21
6 13-23

我同时使用了str_replace_all()str_replace()方法,但没有得到期望的结果。

> df[,1] %>% str_replace_all(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
[1] "a"     "b"     "c"     "bc"    "aa,ba" "ac-bc"
> df[,1] %>% str_replace(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
Error in fix_replacement(replacement) : argument "replacement" is missing, with no default

期望的结果是:

[1] "a"     "b"     "c"     "g"    "d,f" "e-g"

由于有多个值要替换,这就是为什么我的第一个选择是str_replace_all(),因为它允许使用具有原始列值和所需替换值的向量,但该方法由于regex而失败。我做错了吗?还是有更好的选择来解决我的问题?

只需将最长的多字符放在开头,如:

library(stringr)
str_replace_all(df[,1], 
c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c"))
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"

对于更复杂的情况:

x <- c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g")
x <- x[order(nchar(names(x)), decreasing = TRUE)]
str_replace_all(df[,1], x)
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"

使用@GKi答案中的排序方法,这里有一个使用Reduce/gsub而不是stringr::str_replace_all的基本R版本

起始矢量

x <- as.character(df$c1)

按@GKi应答排序

repl_dict <- c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c")
repl_dict <- repl_dict[order(nchar(names(repl_dict)), decreasing = TRUE)]

更换

Reduce(
function(x, n) gsub(n, repl_dict[n], x, fixed = TRUE),
names(repl_dict),
init = x)
#  [1] "a"   "b"   "c"   "g"   "d,f" "e-g"

最新更新