我试图根据条件创建行计数,当条件不满足时,将值重置为0,而不是继续计数。此外,当条件再次满足时,我试图将计数重置为1。我基于id
进行分组,以防止计数溢出到其他横截面单元。下面是一个示例:
# A tibble: 5 × 4
# ccode year id civ_int
# <dbl> <dbl> <dbl> <dbl>
#1 90 1967 1 0
#2 90 1968 1 0
#3 90 1969 1 0
#4 90 1970 1 0
#5 90 1971 1 0
我遇到的问题是,在id
中,计数没有重置为1。相反,它们在civ_int返回0时继续计数。例如,计数可能已经达到22,在这种情况下,当civ_int = 1时,计数将重置为0。然而,当civ_int返回0时,计数从23开始。下面是我如何处理这个问题的语法,以供参考:
merged <- merged %>%
mutate(civ_int = if_else(
deaths >= 25, 1, 0
)) %>%
group_by(id) %>%
mutate(low_years = as.numeric(row_number()
)) %>%
mutate(low_years = cumsum(if_else(
civ_int == 0, 1, 0
))) %>%
mutate(low_years = if_else(
civ_int == 1, 0, low_years
)) %>%
ungroup()
下面是我使用这段代码遇到的问题的一个例子:
# A tibble: 20 × 5
# id year deaths civ_int low_years
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1983 0 0 17
# 2 1 1984 0 0 18
# 3 1 1985 0 0 19
# 4 1 1986 0 0 20
# 5 1 1987 0 0 21
# 6 1 1988 0 0 22
# 7 1 1989 363 1 0
# 8 1 1990 522 1 0
# 9 1 1991 308 1 0
#10 1 1992 273 1 0
#11 1 1993 132 1 0
#12 1 1994 226 1 0
#13 1 1995 74 1 0
#14 1 1996 2 0 23
#15 1 1997 2 0 24
#16 1 1998 1 0 25
#17 1 1999 0 0 26
#18 1 2000 0 0 27
#19 1 2001 0 0 28
#20 1 2002 2 0 29
low_years
应该在1996年重置为1,并从那里向上计数,但这并没有发生。什么好主意吗?
引入额外的分组值可能对您有用
library(dplyr)
df %>%
mutate(civ_int = if_else(deaths >= 25, 1, 0)) %>%
group_by(id, grp = cumsum(civ_int != lag(civ_int, default=1))) %>%
mutate(low_years = cumsum(civ_int == 0)) %>%
ungroup() %>%
select(-grp)
# A tibble: 20 × 5
id year deaths civ_int low_years
<int> <int> <int> <int> <int>
1 1 1983 0 0 1
2 1 1984 0 0 2
3 1 1985 0 0 3
4 1 1986 0 0 4
5 1 1987 0 0 5
6 1 1988 0 0 6
7 1 1989 363 1 0
8 1 1990 522 1 0
9 1 1991 308 1 0
10 1 1992 273 1 0
11 1 1993 132 1 0
12 1 1994 226 1 0
13 1 1995 74 1 0
14 1 1996 2 0 1
15 1 1997 2 0 2
16 1 1998 1 0 3
17 1 1999 0 0 4
18 1 2000 0 0 5
19 1 2001 0 0 6
20 1 2002 2 0 7
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), year = 1983:2002, deaths = c(0L,
0L, 0L, 0L, 0L, 0L, 363L, 522L, 308L, 273L, 132L, 226L, 74L,
2L, 2L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA,
-20L))
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), year = 1983:2002, deaths = c(0L,
0L, 0L, 0L, 0L, 0L, 363L, 522L, 308L, 273L, 132L, 226L, 74L,
2L, 2L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA,
-20L))
using data.table:
library(data.table)
setDT(df)[, low_years := cumsum(deaths < 25), .(id, rleid(deaths>=25))]
id year deaths civ_int low_years
1: 1 1983 0 0 1
2: 1 1984 0 0 2
3: 1 1985 0 0 3
4: 1 1986 0 0 4
5: 1 1987 0 0 5
6: 1 1988 0 0 6
7: 1 1989 363 1 0
8: 1 1990 522 1 0
9: 1 1991 308 1 0
10: 1 1992 273 1 0
11: 1 1993 132 1 0
12: 1 1994 226 1 0
13: 1 1995 74 1 0
14: 1 1996 2 0 1
15: 1 1997 2 0 2
16: 1 1998 1 0 3
17: 1 1999 0 0 4
18: 1 2000 0 0 5
19: 1 2001 0 0 6
20: 1 2002 2 0 7