我在数据帧中有一些列,我想用它们对应的字符串值替换整数。整数经常在单元格中重复(用空格、逗号、/或-等分隔(
> df = data.frame(c1=c(1,2,3,23,c('11,21'),c('13-23')))
> df
c1
1 1
2 2
3 3
4 23
5 11,21
6 13-23
我同时使用了str_replace_all()
和str_replace()
方法,但没有得到期望的结果。
> df[,1] %>% str_replace_all(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
[1] "a" "b" "c" "bc" "aa,ba" "ac-bc"
> df[,1] %>% str_replace(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
Error in fix_replacement(replacement) : argument "replacement" is missing, with no default
期望的结果是:
[1] "a" "b" "c" "g" "d,f" "e-g"
由于有多个值要替换,这就是为什么我的第一个选择是str_replace_all()
,因为它允许使用具有原始列值和所需替换值的向量,但该方法由于regex而失败。我做错了吗?还是有更好的选择来解决我的问题?
只需将最长的多字符放在开头,如:
library(stringr)
str_replace_all(df[,1],
c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c"))
#[1] "a" "b" "c" "g" "d,f" "e-g"
对于更复杂的情况:
x <- c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g")
x <- x[order(nchar(names(x)), decreasing = TRUE)]
str_replace_all(df[,1], x)
#[1] "a" "b" "c" "g" "d,f" "e-g"
使用@GKi答案中的排序方法,这里有一个使用Reduce
/gsub
而不是stringr::str_replace_all
的基本R版本
起始矢量
x <- as.character(df$c1)
按@GKi应答排序
repl_dict <- c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c")
repl_dict <- repl_dict[order(nchar(names(repl_dict)), decreasing = TRUE)]
更换
Reduce(
function(x, n) gsub(n, repl_dict[n], x, fixed = TRUE),
names(repl_dict),
init = x)
# [1] "a" "b" "c" "g" "d,f" "e-g"