我有以下两个数据帧:
df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"))
df2 <- data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"),
max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"),
group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))
我想:
对于每一行,连续检查
df1
的month
列中的值是否与df2
的period
列中的匹配,即df1$month == df2$period
。如果步骤1不为TRUE,即
df1$month != df2$period
,则重复步骤1并将df1
中的值与df2
的下一行中的值进行比较,依此类推,直到df1$month == df2$period
。如果是
df1$month == df2$period
,则检查df1
的temp
列中的值是否小于或等于df2
的max_temp
列中的数值,即df1$temp <= df$max_temp
。如果是
df1$temp <= df$max_temp
,则返回该行中df2
中group
列的值,并将该值添加到名为"new_group"
的新列中的df1
。如果步骤3不为TRUE,即
df1$temp > df$max_temp
,则返回步骤1,将df1
中的同一行与df2
中的下一行进行比较。
我想要的输出数据帧的一个例子是:
df3 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"),
temp=c("10","15","16","25","13","17","20","5","16","25","30","37"),
new_group=c("1","1","1","2","3","4","4","5","6","7","7","8"))
我一直在玩ifelse
功能,需要一些帮助或重新指导。谢谢
我发现计算new_group
的过程很难按照规定执行。据我所知,您正试图在df1
中创建一个名为new_group
的变量。对于df1
的i
行,new_group
值是df2
中第一行的group
值,即:
- 索引为
i
或更高 - 具有与
df1$month[i]
匹配的period
值 max_temp
值不小于df1$temp[i]
我通过使用对df1
:的行索引调用的sapply
来解决这个问题
fxn = function(idx) {
# Potentially matching indices in df2
pm = idx:nrow(df2)
# Matching indices in df2
m = pm[df2$period[pm] == df1$month[idx] &
as.numeric(as.character(df1$temp[idx])) <=
as.numeric(as.character(df2$max_temp[pm]))]
# Return the group associated with the first matching index
return(df2$group[m[1]])
}
df1$new_group = sapply(seq(nrow(df1)), fxn)
df1
# month temp new_group
# 1 1 10 1
# 2 1 15 1
# 3 1 16 1
# 4 1 25 2
# 5 2 13 3
# 6 2 17 4
# 7 2 20 4
# 8 3 5 5
# 9 3 16 6
# 10 3 25 7
# 11 3 30 7
# 12 3 37 8
library(data.table)
dt1 <- data.table(df1, key="month")
dt2 <- data.table(df2, key="period")
## add a row index
dt1[, rn1 := seq(nrow(dt1))]
dt3 <-
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1")
## Keep only the columns you want
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE]
month temp max_temp new_group
1: 1 1 19 1
2: 1 3 19 1
3: 1 4 19 1
4: 1 7 19 1
5: 2 2 1 3
6: 2 5 1 3
7: 2 6 1 4
8: 3 10 18 5
9: 3 4 18 5
10: 3 7 18 5
11: 3 8 18 5
12: 3 9 18 5