这是两个数据帧的子集。
我有一个df1,其中包含每个地块全年每天的度数:
<表类>
日期
情节
degree_days
tbody><<tr>5-13-19 1 3.5 5-13-19 2 5.35 5-13-19 3 4.8 5-14-19 1 4.5 5-14-19 2 4.4 5-14-19 3 5.8 5-15-19 1 3.5 5-15-19 2 5.35 5-15-19 3 4.8 5-16-19 1 4.5 5-16-19 2 4.4 5-16-19 3 5.8 表类>
这是一个基本的R选项,通过定义一个用户函数f
(感谢@akrun's data)
f <- function(d, p) with(subset(df1, date <= d & plot == p), sum(degree_days))
dfout <- within(
df2,
new_var <- Vectorize(f)(mean_first_flower_date, plot)
)
,
> dfout
mean_first_flower_date plot new_var
1 2019-05-16 1 16.0
2 2019-08-05 2 19.5
3 2019-06-12 3 21.2
4 2019-05-16 1 16.0
5 2019-08-05 2 19.5
6 2019-06-12 3 21.2
7 2019-05-16 1 16.0
8 2019-08-05 2 19.5
9 2019-06-12 3 21.2
将'date'列转换为Date
类后,非相等连接将非常有用
library(data.table)
new_var <- setDT(df1)[df2, sum(degree_days), on =
.(plot, date <= mean_first_flower_date), by = .EACHI]$V1
setDT(df2)[, new_var := new_var][]
与产出
df2
# mean_first_flower_date plot new_var
#1: 2019-05-16 1 16.0
#2: 2019-08-05 2 19.5
#3: 2019-06-12 3 21.2
#4: 2019-05-16 1 16.0
#5: 2019-08-05 2 19.5
#6: 2019-06-12 3 21.2
#7: 2019-05-16 1 16.0
#8: 2019-08-05 2 19.5
#9: 2019-06-12 3 21.2
数据df1 <- structure(list(date = c("5-13-19", "5-13-19", "5-13-19", "5-14-19",
"5-14-19", "5-14-19", "5-15-19", "5-15-19", "5-15-19", "5-16-19",
"5-16-19", "5-16-19"), plot = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), degree_days = c(3.5, 5.35, 4.8, 4.5, 4.4, 5.8,
3.5, 5.35, 4.8, 4.5, 4.4, 5.8)), class = "data.frame", row.names = c(NA,
-12L))
df2 <- structure(list(mean_first_flower_date = c("5-16-19", "8-5-19",
"6-12-19", "5-16-19", "8-5-19", "6-12-19", "5-16-19", "8-5-19",
"6-12-19"), plot = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-9L))
df1$date <- as.Date(df1$date, "%m-%d-%y")
df2$mean_first_flower_date <- as.Date(df2$mean_first_flower_date, "%m-%d-%y")