我正在建立一个网络:
from <- c("America, port unspecified", "Boston", "Chicago", "America, port unspecified")
to <- c("Europe, port unspecified", "Nantes", "Le Havre", "Lisbonn")
dataset <- data.frame(from, to)
library(dplyr)
我想用不包含未指定端口的行子集我的数据集:
from to
Boston Nantes
Chicago Le Havre
我试过这个:在下面的代码中,我正在所有列中搜索字符串"端口未指定"。我想保留字符串"端口未指定"在任何变量中不存在的行。
dataset2 <- dataset %>%
filter_all(any_vars(!str_detect(., "port unspecified")))
结果:
from to
Boston Nantes
Chicago Le Havre
America, port unspecified Lisbonn
我成功尝试了下面的代码:
dataset3 <- dataset %>%
filter_all(all_vars(!str_detect(., "port unspecified")))
结果:
from to
Boston Nantes
Chicago Le Havre
为什么all_vars给了我预期的结果而不是any_vars?
library(dplyr)
dataset %>% filter_all(any_vars(!str_detect(., "port unspecified")))
这读作选择行,其中第 2、3、4 行的列any
中没有"port unspecified"
。
而这
dataset %>% filter_all(all_vars(!str_detect(., "port unspecified")))
表示选择第 2 行和第 3 行all
列中没有"port unspecified"
的行。
希望这足够清楚,可以理解。
使用基R
也可以获得等效的结果:
from <- c("America, port unspecified", "Boston", "Chicago", "America, port unspecified")
to <- c("Europe, port unspecified", "Nantes", "Le Havre", "Lisbonn")
dataset <- data.frame(from, to)
# Loop through each column cand check for any port unspecified
semi <- lapply(dataset, grepl, pattern = "port unspecified")
# check which rows have a port unspecified (`pmax`) and exclude them with `!`.
dataset[!do.call(pmax, semi), ]
#> from to
#> 2 Boston Nantes
#> 3 Chicago Le Havre