id hire_month sep_month wage_jan wage_feb wage_mar wage_apr
1 1 2 3 740 780 780 780
2 1 4 0 890 890 890 890
3 2 3 5 550 550 550 550
4 2 5 10 890 250 250 400
wage_may wage_jun wage_jul wage_aug wage_sep wage_oct wage_nov
1 780 780 780 780 780 780 780
2 890 890 890 890 890 790 250
3 550 550 550 550 550 550 550
4 500 890 600 750 890 300 300
wage_dec
1 780
2 300
3 550
4 300
structure(list(id = c(1L, 1L, 2L, 2L), hire_month = c(2L, 4L,
3L, 5L), sep_month = c(3L, 0L, 5L, 10L), wage_jan = c(740L, 890L,
550L, 890L), wage_feb = c(780L, 890L, 550L, 250L), wage_mar = c(780L,
890L, 550L, 250L), wage_apr = c(780L, 890L, 550L, 400L), wage_may = c(780L,
890L, 550L, 500L), wage_jun = c(780L, 890L, 550L, 890L), wage_jul = c(780L,
890L, 550L, 600L), wage_aug = c(780L, 890L, 550L, 750L), wage_sep = c(780L,
890L, 550L, 890L), wage_oct = c(780L, 790L, 550L, 300L), wage_nov = c(780L,
250L, 550L, 300L), wage_dec = c(780L, 300L, 550L, 300L)), class = "data.frame", row.names = c(NA,
-4L))
我想根据hire_month和lag(sep_month(来计算工资之间的差异——前一行和同一id的分隔月份。例如,如果lag(sep_month(是4(april(,hire_month是7(july(,我想取wage_jul和wage_apr之间的差异(wage_arr是前一行的差异(。因此,我需要计算的工资差异实际上在不同的行中,因为我的数据集中的每一行都是一份合同。
我有一个很大的数据集,所以我想要一种自动化这种关联的方法。
已更新
OP澄清了sep-mount值来自上面的行,这意味着每个ID有n-1行,其中n是该ID的行数。在上面的例子中,每个ID有两行,我们将仅为每个ID 估计第二行(也是最后一行(的差异
data$diff = as_tibble(data) %>%
mutate(sep_wage = apply(.,1,function(x) x[x[3]+3])) %>%
group_by(id) %>%
mutate(sep_wage = lag(sep_wage)) %>% ungroup() %>%
apply(.,MARGIN = 1, function(x) x[x[2]+3] - x[16])
输出
id hire_month sep_month wage_jan wage_feb wage_mar wage_apr wage_may wage_jun wage_jul wage_aug wage_sep wage_oct wage_nov wage_dec diff
1 1 2 3 740 780 780 780 780 780 780 780 780 780 780 780 NA
2 1 4 0 890 890 890 890 890 890 890 890 890 790 250 300 110
3 2 3 5 550 550 550 550 550 550 550 550 550 550 550 550 NA
4 2 5 10 890 250 250 400 500 890 600 750 890 300 300 300 -50