r-如何检查一列中的任何字符串是否与另一个数据表中的列中的任意字符串匹配



我有两个数据表

fruit <- c("apple", "banana", "pear", "pineapple")
no <- sample(4L)
fruitDT <- data.table(fruit,no)
fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
takeThisOne <- sample(4L)
fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
fruit no
1:    apple  3
2:   banana  2
3:     pear  1
4: pineapple  4
fruitDT2
fruit2 takeThisOne
1:    apple is a fruit           3
2:   orange is a color           4
3:        pear is pear           2
4: pine is also a tree           1

如果果2中的任何值与果DT中果列中的任何值(部分(匹配,我想提取takeThisOne列的值。

预期结果

apple 3
banana NULL
pear 2
pineapple NULL

我本来打算在str_detect上使用lapply和for循环的组合,但不知道是否有更好的方法?

对于每个fruit,我们可以使用grep并返回fruitDT2中匹配的第一个条目。

这是一个基本的R方法,但使用data.table语法,因为您已经有了一个。

library(data.table)
fruitDT[, TakeThisOne := sapply(fruit, function(x) 
fruitDT2$takeThisOne[grep(x, fruitDT2$fruit2)[1]])]
fruitDT
#       fruit no TakeThisOne
#1:     apple  3           3
#2:    banana  2          NA
#3:      pear  1           2
#4: pineapple  4          NA

使用my样本数据(因为它是随机的(,

set.seed(42)
fruit <- c("apple", "banana", "pear", "pineapple")
# no <- sample(4L)
# fruitDT <- data.table(fruit,no)
# fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
# takeThisOne <- sample(4L)
# fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
#        fruit no
# 1:     apple  1
# 2:    banana  4
# 3:      pear  3
# 4: pineapple  2
fruitDT2
#                 fruit2 takeThisOne
# 1:    apple is a fruit           2
# 2:   orange is a color           4
# 3:        pear is pear           3
# 4: pine is also a tree           1

我相信这是正确的:

fuzzyjoin::regex_right_join(fruitDT2, fruitDT, by = c("fruit2" = "fruit"))[,c("fruit", "takeThisOne")]
#       fruit takeThisOne
# 1     apple           2
# 2    banana          NA
# 3      pear           3
# 4 pineapple          NA

最新更新