我试图下载《经济学人》的Github存储库提供的新冠肺炎数据。
library(readr)
library(knitr)
myfile <- "https://raw.githubusercontent.com/TheEconomist/covid-19-excess-deaths-tracker/master/output-data/excess-deaths/all_weekly_excess_deaths.csv"
test <- read_csv(myfile)
我得到的是一个tibble数据帧,我无法轻松访问存储在该tibble中的数据。我想查看一列,比如test$covid_deaths_per_100k
,并将其重新成形为矩阵或ts
对象,其中行表示时间,列表示国家。
我手动尝试了一下,但失败了。然后我尝试了tsibble
包,但再次失败:
tsibble(test[c("covid_deaths_per_100k","country")],index=test$start_date)
Error: Must extract column with a single valid subscript.
x Subscript `var` has the wrong type `date`.
ℹ It must be numeric or character.
所以,我想问题是数据是按国家堆叠的,因此时间指数是重复的。我需要一些神奇的管道函数来实现这一点?有没有一种简单的方法可以做到这一点,也许不用管道?
有效的tsibble
必须具有由关键字和索引标识的不同行:
as_tsibble(test,index = start_date,key=c(country,region))
# A tsibble: 11,715 x 17 [1D]
# Key: country, region [176]
country region region_code start_date end_date days year week population total_deaths
<chr> <chr> <chr> <date> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Australia Australia 0 2020-01-01 2020-01-07 7 2020 1 25734100 2497
2 Australia Australia 0 2020-01-08 2020-01-14 7 2020 2 25734100 2510
3 Australia Australia 0 2020-01-15 2020-01-21 7 2020 3 25734100 2501
4 Australia Australia 0 2020-01-22 2020-01-28 7 2020 4 25734100 2597
5 Australia Australia 0 2020-01-29 2020-02-04 7 2020 5 25734100 2510
6 Australia Australia 0 2020-02-05 2020-02-11 7 2020 6 25734100 2530
7 Australia Australia 0 2020-02-12 2020-02-18 7 2020 7 25734100 2613
8 Australia Australia 0 2020-02-19 2020-02-25 7 2020 8 25734100 2608
9 Australia Australia 0 2020-02-26 2020-03-03 7 2020 9 25734100 2678
10 Australia Australia 0 2020-03-04 2020-03-10 7 2020 10 25734100 2602
# ... with 11,705 more rows, and 7 more variables: covid_deaths <dbl>, expected_deaths <dbl>,
# excess_deaths <dbl>, non_covid_deaths <dbl>, covid_deaths_per_100k <dbl>,
# excess_deaths_per_100k <dbl>, excess_deaths_pct_change <dbl>
ts与月度、季度或年度系列配合使用效果最佳。这里我们展示一些方法。
1(monthly这将根据所示的test
列创建一个月度动物园对象z
,按国家划分并聚合以生成月度时间序列。然后创建一个ts对象。
library(zoo)
z <- read.zoo(test[c("start_date", "country", "covid_deaths")],
split = "country", FUN = as.yearmon, aggregate = sum)
as.ts(z)
2(每周创建频率为53 的每周ts对象
to_weekly <- function(x) {
yr <- as.integer(as.yearmon(x))
wk <- as.integer(format(as.Date(x), "%U"))
yr + wk/53
}
z <- read.zoo(test[c("start_date", "country", "covid_deaths")],
split = "country", FUN = to_weekly, aggregate = sum)
as.ts(z)
3(daily如果您想要一个时间是日期的系列,那么省略FUN参数,直接使用zoo。
z <- read.zoo(test[c("end_date", "country", "covid_deaths")],
split = "country", aggregate = sum)