对应于另一个数据帧的下一个可用日期的排名



有人可以帮助我满足以下要求吗?

DF1:

Item       Day            Rank
A     2020-01-12
A     2020-01-13
B     2020-01-13

DF2:

Item        Day            Rank
A       2020-01-10         1
A       2020-01-11         2
A       2020-01-14         3
B       2020-01-14         4
B       2020-01-17         5

如果我们在DF1中有Day,而DF2中不存在,那么我想从与DF2中的日期对应的下一个日期获取DF1日期的排名。 在这里,DF1中的2020-01-12没有任何排名,因此它应该从DF2搜索下一个日期,即2020-01-13,相应的排名应该作为输出。

最终输出(DF1(:

Item       Day            Rank
A     2020-01-12          3
A     2020-01-13          3
B     2020-01-13          4

您可以使用如下findInterval

DF1$Rank <- DF2$Rank[findInterval(DF1$Day, c(0,DF2$Day), left.open = TRUE)]
DF1
#         Day Rank
#1 2020-01-12    3
#2 2020-01-15    5

如果DF2未排序:

i <- order(DF2$Day)
j <- findInterval(DF1$Day, c(0,DF2$Day[i]), left.open = TRUE)
DF1$Rank <- DF2$Rank[i[j]]

或者,如果DF2$Rank1:nrow(DF2)并且DF2排序。

DF1$Rank <- findInterval(DF1$Day, c(0,DF2$Day), left.open = TRUE)

数据:

DF1 <- data.frame(Day=as.Date(c("2020-01-12", "2020-01-15")), Rank=NA)
DF2 <- data.frame(Day=as.Date(c("2020-01-10", "2020-01-11", "2020-01-13"
, "2020-01-14", "2020-01-17")), Rank=1:5)

对于更新的问题

DF1 <- data.frame(Item=c("A","A","B"), Day=as.Date(c("2020-01-12", "2020-01-13", "2020-01-13")), Rank=NA)
DF2 <- data.frame(Item=c(rep("A",3),rep("B",2)), Day=as.Date(c("2020-01-10", "2020-01-11", "2020-01-14", "2020-01-14", "2020-01-17")), Rank=1:5)
DF1$Rank <- unlist(sapply(unique(DF1$Item), function(i) {DF2$Rank[DF2$Item==i][findInterval(DF1$Day[DF1$Item==i], c(0,DF2$Day[DF2$Item==i]), left.open = TRUE)]}))
DF1
#  Item        Day Rank
#1    A 2020-01-12    3
#2    A 2020-01-13    3
#3    B 2020-01-13    4

另请查看按区域中的成员身份将一个数据框连接到另一个数据框

我们可以使用fuzzy_join

library(dplyr)
fuzzyjoin::fuzzy_left_join(df1, df2, by = c("Item" = "Item", "Day" = "Day"), 
match_fun = list(`==`, `<=`)) %>%
group_by(Item.x, Day.x) %>%
slice(1L) %>%
mutate(Rank.x = coalesce(as.integer(Rank.x), Rank.y)) %>%
select(-ends_with("y")) %>%
rename_all(~names(df1))
#  Item       Day         Rank
#  <fct>     <date>     <int>
#1   A     2020-01-12     3
#2   A     2020-01-13     3
#3   B     2020-01-13     4

这是一个基本的R解决方案(假设DF1DF2中的列Day已经属于Date类(

DF1 <- do.call(rbind,
c(make.row.names = F,
lapply(split(DF1,DF1$Item), function(v) {
z <- subset(DF2,Item == unique(v$Item))
v <- within(v, Rank <- findInterval(Day, z$Day)+ min(z$Rank))
}
)
)
)

这样

> DF1
Item        Day Rank
1    A 2020-01-12    3
2    A 2020-01-13    3
3    B 2020-01-13    4

数据

DF1 <- structure(list(Item = c("A", "A", "B"), Day = structure(c(18273, 
18274, 18274), class = "Date"), Rank = c(NA, NA, NA)), row.names = c(NA, 
-3L), class = "data.frame")
DF2 <- structure(list(Item = c("A", "A", "A", "B", "B"), Day = structure(c(18271, 
18272, 18275, 18275, 18278), class = "Date"), Rank = 1:5), row.names = c(NA, 
-5L), class = "data.frame")

最新更新