r-筛选包含向量字符串的行

我正在寻找一个函数，它接受一个数据帧列，检查它是否包含字符串向量中的文本，并根据匹配（包括部分文本匹配）对其进行筛选。

例如，以以下数据帧为例：

animal     |count
aardvark   |8
cat        |2
catfish    |6
dog        |12
dolphin    |3
penguin    |38
prairie dog|59
zebra      |17

以及以下矢量

c("cat", "dog")

我想遍历"animal"列，检查值是否与向量中的某个字符串完全或部分匹配，并过滤掉不匹配的字符串。由此产生的数据帧将是：

animal     |count
cat        |2
catfish    |6
dog        |12
prairie dog|59

谢谢！

Sean

使用dplyr，您可以尝试以下操作，假设您的表是df:

library(dplyr)
library(stringr)
animalList <- c("cat", "dog")
filter(df, str_detect(animal, paste(animalList, collapse="|")))

我个人发现dplyr和stringr的使用在几个月后查看我的代码时更容易阅读。

我们可以使用grep

df1[grep(paste(v1, collapse="|"), df1$animal),]

或使用dplyr

df1 %>%
    filter(grepl(paste(v1, collapse="|"), animal))

对于大型数据集，以下base R方法可以比接受的答案快15倍。至少那是我的经历。

该代码生成一个新的数据帧来存储与给定值（动物）匹配的行的子集。

#Create placeholder data frame
new_df <- df[0, ]
#Create vector of unique values
animals <- unique(df$animal)
#Run the loop
for (i in 1:length(animals)){
    temp <- df[df$animal==animals[i], ] 
    new_df <- rbind(new_df,temp)
}

相关内容

最新更新

热门标签：