R从下一个不相等的值中进行有条件的减法运算



给定一个更大的数据帧,大约有300k+行和14列,格式如下:

df <- data.frame(team_id = c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)), 
year = rep(c(1954:1963), 5), members= c(0,0,0,1,1,1,2,0,0,0,0,0,2,1,1,1,0,0,0,0,    1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0),
size = c(rep(60,8),50,50,rep(40,7),50,50,70,rep(30,10),rep(99,6),110,101,101,101,rep(80,9),66) )

其目的是为每支球队创建一个新的向量,包含所有成员离开后(成员从2或1变为0(的大小差异,从下一个不同的大小中减去球员最后离开的年份的大小。应该显示变化的方向,这样就不需要绝对值。到目前为止,我所取得的成就是:

df2 <- df %>% arrange(team_id,year) %>%
group_by(team_id)     %>%
mutate(sizediff = if_else(members == 1  & lead(members) == 0 | members == 2  & lead(members) == 0,1,0, missing =  0) )

然而,我希望与未来的大小有差异,而不是sizediff向量中的值1。也许从长格式改为宽格式或有条件地重新安排年份矢量会有所帮助,但我被卡住了。我想要实现的是:

aim <- data.frame(team_id = c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)), 
year = rep(c(1954:1963), 5), members= c(0,0,0,1,1,1,2,0,0,0,    0,0,2,1,1,1,0,0,0,0,    1,1,1,1,1,1,1,1,1,1,    0,1,1,1,0,0,0,0,0,0,    1,1,1,1,1,1,1,1,0,0    ) ,
size = c(57,rep(60,7),50,50,rep(40,7),50,50,70,rep(30,10),rep(99,6),110,101,101,101,88,rep(80,8),66), 
sizediff = c(rep(0,6),-10,rep(0,3),rep(0,5),10,rep(0,4),rep(0,10),rep(0,3),11,rep(0,6),rep(0,7),-14,rep(0,2))    )

这是您想要的东西吗?

df %>% 
arrange(team_id, year) %>% 
mutate(diff = if_else((members> 0 & dplyr::lead(members, n=1)==0), size, 0)) %>% 
group_by(team_id) %>% 
mutate(diff = ifelse(diff>0, dplyr::last(size)-size, NA))

尝试这种自定义方法:

library(dplyr)
df %>%
group_by(team_id) %>%
mutate(sizediff = {
sizediff = rep(0, n())
inds <- which(members %in% c(1, 2)  & lead(members) == 0)[1]
sizediff[inds] <- size[which(row_number() > inds & size != size[inds])[1]] - size[inds]
sizediff
}) -> result
result
#   team_id  year members  size sizediff
#     <dbl> <int>   <dbl> <dbl>    <dbl>
# 1       1  1954       0    60        0
# 2       1  1955       0    60        0
# 3       1  1956       0    60        0
# 4       1  1957       1    60        0
# 5       1  1958       1    60        0
# 6       1  1959       1    60        0
# 7       1  1960       2    60      -10
# 8       1  1961       0    60        0
# 9       1  1962       0    50        0
#10       1  1963       0    50        0
# … with 40 more rows

我们首先将sizediff初始化为0,inds用于查找成员的剩余位置。我们计算size与改变并更新inds位置的下一个值的差。

最新更新