如何在矢量条件语句中获得与grepl
相同的功能?
我希望通过在地区名称中缺失的城市名称前加上来转换地理分区的原始数据(与其他类别混合(
#Build index dataframe
(index <- data.frame(div_raw=c("Brussels", "Paris", "Paris I", "II", "total"),
city=c("Brussels", "Paris", "Paris", "Paris", NA)))
# div_raw city
#1 Brussels Brussels
#2 Paris Paris
#3 Paris I Paris
#4 II Paris
#5 total <NA>
#Prepend city name to district names, where available
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city), city, ""), div_raw))
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
可以看出,我们还应该测试城市是否已经包含在地区名称中,但grepl
应用了整个模式向量,而不仅仅是匹配的模式值:
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !grepl(city, div_raw), city, ""), div_raw))
#Warning message:
#In grepl(city, div_raw) :
# argument 'pattern' has length > 1 and only the first element will be used
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
预期结果:
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
使用Vectorize
像这样更改代码,它应该可以工作,而不是像下面这样使用vgrepl
。Vectorize函数对参数进行矢量化,尽管您可以在使用vectorize.args
时选择要矢量化的参数,因为默认情况下,grepl不会在有输入的模式上进行矢量化——您会得到以下错误:
vgrepl <- Vectorize(grepl)
# you can write this also: vgrepl <- Vectorize(grepl, vectorize.args = c('x', 'pattern'))
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !vgrepl(city, div_raw), city, ""), div_raw))
输出:
> index
div_raw city div
1 Brussels Brussels Brussels
2 Paris Paris Paris
3 Paris I Paris Paris I
4 II Paris Paris II
5 total <NA> total
您可以使用矢量化的grepl
,即stringr::str_detect
:
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) &
!stringr::str_detect(div_raw, city), city, ""), div_raw))
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris I
#4 II Paris Paris II
#5 total <NA> total