我有以下问题,我似乎找不到解决它的好方法。假设我有一个面板数据集,其中包含(在不同时间)接受治疗的受试者。
可重现的示例:
df <- data.frame(subject = rep(c("A", "B"), each = 6),
period = rep(c(2006:2011), 2),
treatment = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0))
现在我想创建一个指标变量"post",该变量在治疗后的所有时间段都等于 1,这样数据如下所示:
subject period treatment post
1 A 2006 0 0
2 A 2007 1 1
3 A 2008 0 1
4 A 2009 0 1
5 A 2010 0 1
6 A 2011 0 1
7 B 2006 0 0
8 B 2007 0 0
9 B 2008 1 1
10 B 2009 0 1
11 B 2010 0 1
12 B 2011 0 1
我试图使用滞后等来解决它,但代码变得非常混乱。解决这个问题的优雅方法是什么?
谢谢
您可以使用ave()
.
transform(df, post = ave(treatment == 1, subject, FUN = cumsum))
# subject period treatment post
# 1 A 2006 0 0
# 2 A 2007 1 1
# 3 A 2008 0 1
# 4 A 2009 0 1
# 5 A 2010 0 1
# 6 A 2011 0 1
# 7 B 2006 0 0
# 8 B 2007 0 0
# 9 B 2008 1 1
# 10 B 2009 0 1
# 11 B 2010 0 1
# 12 B 2011 0 1
with '
tidyverse':
df %>%
group_by(subject) %>%
arrange(subject,period) %>%
mutate(post= cumsum(treatment))
# A tibble: 12 x 4
# Groups: subject [2]
subject period treatment post
<fct> <int> <dbl> <dbl>
1 A 2006 0 0
2 A 2007 1 1
3 A 2008 0 1
4 A 2009 0 1
5 A 2010 0 1
6 A 2011 0 1
7 B 2006 0 0
8 B 2007 0 0
9 B 2008 1 1
10 B 2009 0 1
11 B 2010 0 1
12 B 2011 0 1