我有一个数据。表:
library(data.table)
dt = structure(list(date = c("01.01.2020", "01.01.2020", "02.01.2020",
"02.01.2020", "03.01.2020", "03.01.2020", "04.01.2020", "04.01.2020"
), name = c("10AFC25D", "FA1A310C", "10AFC25D", "FA1A310C", "10AFC25D",
"FA1A310C", "10AFC25D", "FA1A310C"), value = c(100L, 50L, 80L,
60L, 70L, 60L, 50L, 80L)), row.names = c(NA, -8L), class = c("data.table", "data.frame"))
dt[, date:=as.Date(date, format="%d.%m.%Y")]
看起来像:
> dt
date name value
1: 01.01.2020 10AFC25D 100
2: 01.01.2020 FA1A310C 50
3: 02.01.2020 10AFC25D 80
4: 02.01.2020 FA1A310C 60
5: 03.01.2020 10AFC25D 70
6: 03.01.2020 FA1A310C 60
7: 04.01.2020 10AFC25D 50
8: 04.01.2020 FA1A310C 80
目标:我想计算两个新列,这两个列给出了与最后一天值的差值。一列显示绝对差异,另一列显示相对差异。公式应该是我可以将滞后时间从1天更改为7天的弹性贝尔(如果我想比较相同的工作日(或任何其他值。
预期输出应为:
date name value diff_absolut diff_relative
1: 01.01.2020 10AFC25D 100 NA NA
2: 01.01.2020 FA1A310C 50 NA NA
3: 02.01.2020 10AFC25D 80 -20 -0.2000000
4: 02.01.2020 FA1A310C 60 10 0.2000000
5: 03.01.2020 10AFC25D 70 -10 -0.1250000
6: 03.01.2020 FA1A310C 60 0 0.0000000
7: 04.01.2020 10AFC25D 50 -20 -0.2857143
8: 04.01.2020 FA1A310C 80 20 0.3333333
我可以像这样解决:
dt2 = copy(dt)
dt2[, date:=date+days(1)]
dt_final = merge(dt, dt2, by=c("date", "name"), all.x=TRUE, suffixes=c("", "_2"))
dt_final[, `:=`(diff_absolute=value-value_2, diff_relative=(value-value_2)/value_2, value_2=NULL)]
dt_final
date name value diff_absolute diff_relative
1: 2020-01-01 10AFC25D 100 NA NA
2: 2020-01-01 FA1A310C 50 NA NA
3: 2020-01-02 10AFC25D 80 -20 -0.2000000
4: 2020-01-02 FA1A310C 60 10 0.2000000
5: 2020-01-03 10AFC25D 70 -10 -0.1250000
6: 2020-01-03 FA1A310C 60 0 0.0000000
7: 2020-01-04 10AFC25D 50 -20 -0.2857143
8: 2020-01-04 FA1A310C 80 20 0.3333333
这是正确的,但它看起来并不是很优雅和高效。由于原始数据有1到2400万行,我想我最好问问是否有人有更平滑的解决方案?请仅提供数据。表。非常感谢。
如果你从行的角度来考虑这一点,这应该做到:
lag = 2L
dt[, diff_absolut := shift(value, n = lag) - value]
dt[, diff_relative := diff_absolut / shift(value, n = lag)]
这样?
对于较长的滞后,在shift
-函数中设置n参数
dt[, `:=`(diff_absolute = value - shift(value),
diff_relative = (value - shift(value)) / shift(value)),
by = .(name)][]