r语言 - 一个df基对另一个df的两列的值求和



这是两个数据帧的子集。

我有一个df1,其中包含每个地块全年每天的度数:

<表类> 日期 情节 degree_days tbody><<tr>5-13-1913.55-13-1925.355-13-1934.85-14-1914.55-14-1924.45-14-1935.85-15-1913.55-15-1925.355-15-1934.85-16-1914.55-16-1924.45-16-1935.8

这是一个基本的R选项,通过定义一个用户函数f(感谢@akrun's data)

f <- function(d, p) with(subset(df1, date <= d & plot == p), sum(degree_days))
dfout <- within(
df2,
new_var <- Vectorize(f)(mean_first_flower_date, plot)
)

,

> dfout
mean_first_flower_date plot new_var
1             2019-05-16    1    16.0
2             2019-08-05    2    19.5
3             2019-06-12    3    21.2
4             2019-05-16    1    16.0
5             2019-08-05    2    19.5
6             2019-06-12    3    21.2
7             2019-05-16    1    16.0
8             2019-08-05    2    19.5
9             2019-06-12    3    21.2

将'date'列转换为Date类后,非相等连接将非常有用

library(data.table)
new_var <- setDT(df1)[df2, sum(degree_days), on =
.(plot, date <= mean_first_flower_date), by = .EACHI]$V1
setDT(df2)[, new_var := new_var][]

与产出

df2
#   mean_first_flower_date plot new_var
#1:             2019-05-16    1    16.0
#2:             2019-08-05    2    19.5
#3:             2019-06-12    3    21.2
#4:             2019-05-16    1    16.0
#5:             2019-08-05    2    19.5
#6:             2019-06-12    3    21.2
#7:             2019-05-16    1    16.0
#8:             2019-08-05    2    19.5
#9:             2019-06-12    3    21.2

数据
df1 <- structure(list(date = c("5-13-19", "5-13-19", "5-13-19", "5-14-19", 
"5-14-19", "5-14-19", "5-15-19", "5-15-19", "5-15-19", "5-16-19", 
"5-16-19", "5-16-19"), plot = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L), degree_days = c(3.5, 5.35, 4.8, 4.5, 4.4, 5.8, 
3.5, 5.35, 4.8, 4.5, 4.4, 5.8)), class = "data.frame", row.names = c(NA, 
-12L))
df2 <- structure(list(mean_first_flower_date = c("5-16-19", "8-5-19", 
"6-12-19", "5-16-19", "8-5-19", "6-12-19", "5-16-19", "8-5-19", 
"6-12-19"), plot = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), 
class = "data.frame", row.names = c(NA, 
-9L))
df1$date <- as.Date(df1$date, "%m-%d-%y")
df2$mean_first_flower_date <- as.Date(df2$mean_first_flower_date, "%m-%d-%y")

最新更新