在下面的数据中,我们观察到一个特定国家在一段时间内的虚拟GDP增长。我的目标是创建一个包含三个类别的变量:0=无危机,1=危机,2=严重危机。这就是识别经济危机是指增长率低于前三年增长趋势平均值至少一个(危机)或两个(严重)标准差的年份。
有人能给点指导吗?
growth year
5 1990
4 1991
0 1992
-4 1993
-3 1994
-1 1995
2 1996
4 1997
7 1998
10 1999
8 2000
-10 2001
-8 2002
2 2003
4 2004
5 2005
8 2006
4 2007
-10 2008
-9 2009
-8 2010
-3 2011
0 2012
-5 2013
-6 2014
-2 2015
4 2016
5 2017
5 2018
8 2019
2 2020
-1 2021
-1 2022
数据如下:
df=structure(list(gdp_growth = c(5, 4, 0, -4, -3, -1, 2, 4, 7, 10,
8, -10, -8, 2, 4, 5, 8, 4, -10, -9, -8, -3, 0, -5, -6, -2, 4,
5, 5, 8, 2, -1, -1), year = c(1990, 1991, 1992, 1993, 1994, 1995,
1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021, 2022)), row.names = c(NA, -33L), class = "data.frame")
从你的描述中,听起来好像你首先需要计算增长的滚动平均值,然后将今年的增长与此进行比较:
library(dplyr)
df %>%
mutate(mn = zoo::rollmean(gdp_growth, 3, na.pad = TRUE, align = 'right'),
crisis = ifelse(gdp_growth < (mn - sd(gdp_growth)),
ifelse(gdp_growth < (mn - 2 * sd(gdp_growth)),
2, 1), 0)) %>%
select(-mn)
#> gdp_growth year crisis
#> 1 5 1990 NA
#> 2 4 1991 NA
#> 3 0 1992 0
#> 4 -4 1993 0
#> 5 -3 1994 0
#> 6 -1 1995 0
#> 7 2 1996 0
#> 8 4 1997 0
#> 9 7 1998 0
#> 10 10 1999 0
#> 11 8 2000 0
#> 12 -10 2001 2
#> 13 -8 2002 0
#> 14 2 2003 0
#> 15 4 2004 0
#> 16 5 2005 0
#> 17 8 2006 0
#> 18 4 2007 0
#> 19 -10 2008 1
#> 20 -9 2009 0
#> 21 -8 2010 0
#> 22 -3 2011 0
#> 23 0 2012 0
#> 24 -5 2013 0
#> 25 -6 2014 0
#> 26 -2 2015 0
#> 27 4 2016 0
#> 28 5 2017 0
#> 29 5 2018 0
#> 30 8 2019 0
#> 31 2 2020 0
#> 32 -1 2021 0
#> 33 -1 2022 0
这里是另一个例子,这次使用RcppRoll
包,它具有与dplyr
兼容的大量快速滚动功能。
library(dplyr)
df %>%
mutate(
std3 = RcppRoll::roll_sd(gdp_growth , 3, fill=0, align = "right"),
crisis = case_when(
std3 < 1 ~ 'no crisis',
std3 < 2 ~ 'crisis',
T ~ 'severe crisis'
)
)
#> gdp_growth year std3 crisis
#> 1 5 1990 0.0000000 no crisis
#> 2 4 1991 0.0000000 no crisis
#> 3 0 1992 2.6457513 severe crisis
#> 4 -4 1993 4.0000000 severe crisis
#> 5 -3 1994 2.0816660 severe crisis
#> 6 -1 1995 1.5275252 crisis
#> 7 2 1996 2.5166115 severe crisis
#> 8 4 1997 2.5166115 severe crisis
#> 9 7 1998 2.5166115 severe crisis
#> 10 10 1999 3.0000000 severe crisis
#> 11 8 2000 1.5275252 crisis
#> 12 -10 2001 11.0151411 severe crisis
#> 13 -8 2002 9.8657657 severe crisis
#> 14 2 2003 6.4291005 severe crisis
#> 15 4 2004 6.4291005 severe crisis
#> 16 5 2005 1.5275252 crisis
#> 17 8 2006 2.0816660 severe crisis
#> 18 4 2007 2.0816660 severe crisis
#> 19 -10 2008 9.4516313 severe crisis
#> 20 -9 2009 7.8102497 severe crisis
#> 21 -8 2010 1.0000000 crisis
#> 22 -3 2011 3.2145503 severe crisis
#> 23 0 2012 4.0414519 severe crisis
#> 24 -5 2013 2.5166115 severe crisis
#> 25 -6 2014 3.2145503 severe crisis
#> 26 -2 2015 2.0816660 severe crisis
#> 27 4 2016 5.0332230 severe crisis
#> 28 5 2017 3.7859389 severe crisis
#> 29 5 2018 0.5773503 no crisis
#> 30 8 2019 1.7320508 crisis
#> 31 2 2020 3.0000000 severe crisis
#> 32 -1 2021 4.5825757 severe crisis
#> 33 -1 2022 1.7320508 crisis
由reprex包(v2.0.1)在2022-07-11创建
您可以在dplyr
中使用lag
,rowwise
*和mutate
:
library(dplyr)
df |>
mutate(gdp3_growth_lag1 = lag(gdp_growth, 1),
gdp3_growth_lag2 = lag(gdp_growth, 2),
gdp3_growth_lag3 = lag(gdp_growth, 3)) |>
rowwise() |>
mutate(
gdp3_growth_mean = mean(c_across(starts_with("gdp3_growth_lag"))),
gdp3_growth_sd = sd(c_across(starts_with("gdp3_growth_lag")))
) |>
ungroup() |>
mutate(crisis = case_when(gdp_growth <= gdp3_growth_mean - 2 * gdp3_growth_sd ~ 2,
gdp_growth <= gdp3_growth_mean - gdp3_growth_sd ~ 1,
is.na(gdp3_growth_mean) ~ NA_real_,
TRUE ~ 0)) |>
select(-starts_with("gdp3"))
输出:
# A tibble: 33 × 3
gdp_growth year crisis
<dbl> <dbl> <dbl>
1 5 1990 NA
2 4 1991 NA
3 0 1992 NA
4 -4 1993 2
5 -3 1994 0
6 -1 1995 0
7 2 1996 0
8 4 1997 0
9 7 1998 0
10 10 1999 0
11 8 2000 0
12 -10 2001 2
13 -8 2002 0
14 2 2003 0
15 4 2004 0
16 5 2005 0
17 8 2006 0
18 4 2007 0
19 -10 2008 2
20 -9 2009 1
21 -8 2010 0
22 -3 2011 0
23 0 2012 0
24 -5 2013 0
25 -6 2014 1
26 -2 2015 0
27 4 2016 0
28 5 2017 0
29 5 2018 0
30 8 2019 0
31 2 2020 2
32 -1 2021 2
33 -1 2022 0
更新完整输出
(*)matrixStats
中也有rowSds