我有两个数据表
fruit <- c("apple", "banana", "pear", "pineapple")
no <- sample(4L)
fruitDT <- data.table(fruit,no)
fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
takeThisOne <- sample(4L)
fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
fruit no
1: apple 3
2: banana 2
3: pear 1
4: pineapple 4
fruitDT2
fruit2 takeThisOne
1: apple is a fruit 3
2: orange is a color 4
3: pear is pear 2
4: pine is also a tree 1
如果果2中的任何值与果DT中果列中的任何值(部分(匹配,我想提取takeThisOne列的值。
预期结果
apple 3
banana NULL
pear 2
pineapple NULL
我本来打算在str_detect上使用lapply和for循环的组合,但不知道是否有更好的方法?
对于每个fruit
,我们可以使用grep
并返回fruitDT2
中匹配的第一个条目。
这是一个基本的R方法,但使用data.table
语法,因为您已经有了一个。
library(data.table)
fruitDT[, TakeThisOne := sapply(fruit, function(x)
fruitDT2$takeThisOne[grep(x, fruitDT2$fruit2)[1]])]
fruitDT
# fruit no TakeThisOne
#1: apple 3 3
#2: banana 2 NA
#3: pear 1 2
#4: pineapple 4 NA
使用my样本数据(因为它是随机的(,
set.seed(42)
fruit <- c("apple", "banana", "pear", "pineapple")
# no <- sample(4L)
# fruitDT <- data.table(fruit,no)
# fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
# takeThisOne <- sample(4L)
# fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
# fruit no
# 1: apple 1
# 2: banana 4
# 3: pear 3
# 4: pineapple 2
fruitDT2
# fruit2 takeThisOne
# 1: apple is a fruit 2
# 2: orange is a color 4
# 3: pear is pear 3
# 4: pine is also a tree 1
我相信这是正确的:
fuzzyjoin::regex_right_join(fruitDT2, fruitDT, by = c("fruit2" = "fruit"))[,c("fruit", "takeThisOne")]
# fruit takeThisOne
# 1 apple 2
# 2 banana NA
# 3 pear 3
# 4 pineapple NA