矢量条件语句中的grepl功能

  • 本文关键字:grepl 功能 语句 条件 r
  • 更新时间 :
  • 英文 :


如何在矢量条件语句中获得与grepl相同的功能?

我希望通过在地区名称中缺失的城市名称前加上来转换地理分区的原始数据(与其他类别混合(

#Build index dataframe
(index <- data.frame(div_raw=c("Brussels", "Paris", "Paris I", "II", "total"), 
city=c("Brussels", "Paris", "Paris", "Paris", NA)))
#   div_raw     city
#1 Brussels Brussels
#2    Paris    Paris
#3  Paris I    Paris
#4       II    Paris
#5    total     <NA>
#Prepend city name to district names, where available
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city), city, ""), div_raw))
index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris Paris Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

可以看出,我们还应该测试城市是否已经包含在地区名称中,但grepl应用了整个模式向量,而不仅仅是匹配的模式值:

index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !grepl(city, div_raw), city, ""), div_raw))
#Warning message:
#In grepl(city, div_raw) :
#  argument 'pattern' has length > 1 and only the first element will be used
index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris Paris Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

预期结果:

index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris       Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

使用Vectorize像这样更改代码,它应该可以工作,而不是像下面这样使用vgrepl。Vectorize函数对参数进行矢量化,尽管您可以在使用vectorize.args时选择要矢量化的参数,因为默认情况下,grepl不会在有输入的模式上进行矢量化——您会得到以下错误:

vgrepl <- Vectorize(grepl)
# you can write this also: vgrepl <- Vectorize(grepl, vectorize.args = c('x', 'pattern'))
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !vgrepl(city, div_raw), city, ""), div_raw))

输出

> index
div_raw     city       div
1 Brussels Brussels  Brussels
2    Paris    Paris     Paris
3  Paris I    Paris   Paris I
4       II    Paris  Paris II
5    total     <NA>     total

您可以使用矢量化的grepl,即stringr::str_detect:

index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & 
!stringr::str_detect(div_raw, city), city, ""), div_raw))
index
#   div_raw     city       div
#1 Brussels Brussels  Brussels
#2    Paris    Paris     Paris
#3  Paris I    Paris   Paris I
#4       II    Paris  Paris II
#5    total     <NA>     total

最新更新