R—在两个数据帧中连续比较行并返回一个值



我有以下两个数据帧:

df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"))

df2 <-  data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"),
max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"),
group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))

我想:

  1. 对于每一行,连续检查df1month列中的值是否与df2period列中的匹配,df1$month == df2$period

  2. 如果步骤1不为TRUE,df1$month != df2$period,则重复步骤1并将df1中的值与df2的下一行中的值进行比较,依此类推,直到df1$month == df2$period

  3. 如果是df1$month == df2$period,则检查df1temp列中的值是否小于或等于df2max_temp列中的数值,df1$temp <= df$max_temp

  4. 如果是df1$temp <= df$max_temp,则返回该行中df2group列的值,并将该值添加到名为"new_group"的新列中的df1

  5. 如果步骤3不为TRUE,df1$temp > df$max_temp,则返回步骤1,将df1中的同一行与df2中的下一行进行比较。

我想要的输出数据帧的一个例子是:

df3 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"),
new_group=c("1","1","1","2","3","4","4","5","6","7","7","8"))

我一直在玩ifelse功能,需要一些帮助或重新指导。谢谢

我发现计算new_group的过程很难按照规定执行。据我所知,您正试图在df1中创建一个名为new_group的变量。对于df1i行,new_group值是df2中第一行的group值,即:

  1. 索引为i或更高
  2. 具有与df1$month[i]匹配的period
  3. max_temp值不小于df1$temp[i]

我通过使用对df1:的行索引调用的sapply来解决这个问题

fxn = function(idx) {
# Potentially matching indices in df2
pm = idx:nrow(df2)
# Matching indices in df2
m = pm[df2$period[pm] == df1$month[idx] &
as.numeric(as.character(df1$temp[idx])) <=
as.numeric(as.character(df2$max_temp[pm]))]
# Return the group associated with the first matching index
return(df2$group[m[1]])
}
df1$new_group = sapply(seq(nrow(df1)), fxn)
df1
#    month temp new_group
# 1      1   10         1
# 2      1   15         1
# 3      1   16         1
# 4      1   25         2
# 5      2   13         3
# 6      2   17         4
# 7      2   20         4
# 8      3    5         5
# 9      3   16         6
# 10     3   25         7
# 11     3   30         7
# 12     3   37         8
library(data.table)
dt1 <- data.table(df1, key="month")
dt2 <- data.table(df2, key="period")
## add a row index
dt1[, rn1 := seq(nrow(dt1))]
dt3 <- 
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1")
## Keep only the columns you want
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE]
month temp max_temp new_group
1:     1    1       19         1
2:     1    3       19         1
3:     1    4       19         1
4:     1    7       19         1
5:     2    2        1         3
6:     2    5        1         3
7:     2    6        1         4
8:     3   10       18         5
9:     3    4       18         5
10:     3    7       18         5
11:     3    8       18         5
12:     3    9       18         5

相关内容

  • 没有找到相关文章

最新更新