我有这样的数据:
test <- data.frame(id = c(1,2,1,5,5,5,6),
time = c(0,1,4,5,6,7,9),
cond = c("a","a","b","a","b","b","b"),
value = c(5,3,2,4,0,3,1),
stringsAsFactors=F)
setDT(test)[,order := order(time),id][order(id,order)]
id time cond value order
1 0 a 5 1
2 1 a 3 1
1 4 b 2 2
5 5 a 4 1
5 6 b 0 2
5 7 b 3 3
6 9 b 1 1
数据函数创建一个列"顺序",这是基于组ID的时间顺序。
我想创建一个返回上一个值但仅在条件为" b"的列。如果条件是" A"返回当前值,并且条件为" b",并且上一个为" b",则跳过下一个非" b"。如果组的第一个条件为" b",则返回na。
所需的输出就是这样:
id time cond value order prev
1 0 a 5 1 5
2 1 a 3 1 3
1 4 b 2 2 5
5 5 a 4 1 4
5 6 b 0 2 4
5 7 b 3 3 4
6 9 b 1 1 NA
我尝试过这样的一些功能,但仅返回NAS。
test[, prev := shift(value[cond == 'b']), .(id,order)]
如果我正确理解了问题,则可以是:
library(data.table)
setDT(test)[, order := order(time), id][order(id, order)]
test[, prev := {
frst <- ifelse(cond[1] == "a", value[1],
ifelse(cond[1] == "b", NA, cond[1]))
prev <- as.integer(ifelse(cond == "b" & shift(cond) == "b",
NA,
c(frst, shift(value)[-1])))
}, by = id][cond == "b", prev := zoo::na.locf(prev), by = id]
输出:
id time cond value order prev
1: 1 0 a 5 1 5
2: 1 4 b 2 2 5
3: 2 1 a 3 1 3
4: 5 5 a 4 1 4
5: 5 6 b 0 2 4
6: 5 7 b 3 3 4
7: 6 9 b 1 1 NA
如果首先分配非B值,zoo:na.locf
可以完成其余的(填充B(Na)值)。
library(zoo)
test[cond != 'b', prev := value]
test[, prev := na.locf(prev), id]
test
# id time cond value order prev
# 1: 1 0 a 5 1 5
# 2: 2 1 a 3 1 3
# 3: 1 4 b 2 2 5
# 4: 5 5 a 4 1 4
# 5: 5 6 b 0 2 4
# 6: 5 7 b 3 3 4
# 7: 6 9 b 1 1 NA