有人可以帮助我满足以下要求吗?
DF1:
Item Day Rank
A 2020-01-12
A 2020-01-13
B 2020-01-13
DF2:
Item Day Rank
A 2020-01-10 1
A 2020-01-11 2
A 2020-01-14 3
B 2020-01-14 4
B 2020-01-17 5
如果我们在DF1中有Day,而DF2中不存在,那么我想从与DF2中的日期对应的下一个日期获取DF1日期的排名。 在这里,DF1中的2020-01-12没有任何排名,因此它应该从DF2搜索下一个日期,即2020-01-13,相应的排名应该作为输出。
最终输出(DF1(:
Item Day Rank
A 2020-01-12 3
A 2020-01-13 3
B 2020-01-13 4
您可以使用如下findInterval
:
DF1$Rank <- DF2$Rank[findInterval(DF1$Day, c(0,DF2$Day), left.open = TRUE)]
DF1
# Day Rank
#1 2020-01-12 3
#2 2020-01-15 5
如果DF2未排序:
i <- order(DF2$Day)
j <- findInterval(DF1$Day, c(0,DF2$Day[i]), left.open = TRUE)
DF1$Rank <- DF2$Rank[i[j]]
或者,如果DF2$Rank1:nrow(DF2)
并且DF2
排序。
DF1$Rank <- findInterval(DF1$Day, c(0,DF2$Day), left.open = TRUE)
数据:
DF1 <- data.frame(Day=as.Date(c("2020-01-12", "2020-01-15")), Rank=NA)
DF2 <- data.frame(Day=as.Date(c("2020-01-10", "2020-01-11", "2020-01-13"
, "2020-01-14", "2020-01-17")), Rank=1:5)
对于更新的问题:
DF1 <- data.frame(Item=c("A","A","B"), Day=as.Date(c("2020-01-12", "2020-01-13", "2020-01-13")), Rank=NA)
DF2 <- data.frame(Item=c(rep("A",3),rep("B",2)), Day=as.Date(c("2020-01-10", "2020-01-11", "2020-01-14", "2020-01-14", "2020-01-17")), Rank=1:5)
DF1$Rank <- unlist(sapply(unique(DF1$Item), function(i) {DF2$Rank[DF2$Item==i][findInterval(DF1$Day[DF1$Item==i], c(0,DF2$Day[DF2$Item==i]), left.open = TRUE)]}))
DF1
# Item Day Rank
#1 A 2020-01-12 3
#2 A 2020-01-13 3
#3 B 2020-01-13 4
另请查看按区域中的成员身份将一个数据框连接到另一个数据框
我们可以使用fuzzy_join
library(dplyr)
fuzzyjoin::fuzzy_left_join(df1, df2, by = c("Item" = "Item", "Day" = "Day"),
match_fun = list(`==`, `<=`)) %>%
group_by(Item.x, Day.x) %>%
slice(1L) %>%
mutate(Rank.x = coalesce(as.integer(Rank.x), Rank.y)) %>%
select(-ends_with("y")) %>%
rename_all(~names(df1))
# Item Day Rank
# <fct> <date> <int>
#1 A 2020-01-12 3
#2 A 2020-01-13 3
#3 B 2020-01-13 4
这是一个基本的R解决方案(假设DF1
和DF2
中的列Day
已经属于Date
类(
DF1 <- do.call(rbind,
c(make.row.names = F,
lapply(split(DF1,DF1$Item), function(v) {
z <- subset(DF2,Item == unique(v$Item))
v <- within(v, Rank <- findInterval(Day, z$Day)+ min(z$Rank))
}
)
)
)
这样
> DF1
Item Day Rank
1 A 2020-01-12 3
2 A 2020-01-13 3
3 B 2020-01-13 4
数据
DF1 <- structure(list(Item = c("A", "A", "B"), Day = structure(c(18273,
18274, 18274), class = "Date"), Rank = c(NA, NA, NA)), row.names = c(NA,
-3L), class = "data.frame")
DF2 <- structure(list(Item = c("A", "A", "A", "B", "B"), Day = structure(c(18271,
18272, 18275, 18275, 18278), class = "Date"), Rank = 1:5), row.names = c(NA,
-5L), class = "data.frame")