r语言 - 使用负str_detect筛选多个列上的行时,any_vars 和 all_vars 谓词的用法不明确



我正在建立一个网络:

from <- c("America, port unspecified", "Boston", "Chicago", "America, port unspecified")
to <-  c("Europe, port unspecified", "Nantes", "Le Havre", "Lisbonn")
dataset <- data.frame(from, to)
library(dplyr)

我想用不包含未指定端口的行子集我的数据集:

from       to
Boston     Nantes
Chicago    Le Havre

我试过这个:在下面的代码中,我正在所有列中搜索字符串"端口未指定"。我想保留字符串"端口未指定"在任何变量中不存在的行。

dataset2 <- dataset %>%
filter_all(any_vars(!str_detect(., "port unspecified")))

结果:

from   to
Boston  Nantes
Chicago Le Havre
America, port unspecified   Lisbonn

我成功尝试了下面的代码:

dataset3 <- dataset %>%
filter_all(all_vars(!str_detect(., "port unspecified")))

结果:

from  to
Boston  Nantes
Chicago Le Havre

为什么all_vars给了我预期的结果而不是any_vars?

library(dplyr)
dataset %>% filter_all(any_vars(!str_detect(., "port unspecified")))

这读作选择行,其中第 2、3、4 行的any中没有"port unspecified"

而这

dataset %>% filter_all(all_vars(!str_detect(., "port unspecified")))

表示选择第 2 行和第 3 行all列中没有"port unspecified"的行。

希望这足够清楚,可以理解。

使用基R也可以获得等效的结果:

from <- c("America, port unspecified", "Boston", "Chicago", "America, port unspecified")
to <-  c("Europe, port unspecified", "Nantes", "Le Havre", "Lisbonn")
dataset <- data.frame(from, to)
# Loop through each column cand check for any port unspecified
semi <- lapply(dataset, grepl, pattern = "port unspecified")
# check which rows have a port unspecified (`pmax`) and exclude them with `!`.
dataset[!do.call(pmax, semi), ]
#>      from       to
#> 2  Boston   Nantes
#> 3 Chicago Le Havre

最新更新