我正在使用readxl::read_excel
将Robert Koch Institute的excel文件导入R中。但是,我遇到了一个只包含日期的行的问题。日期格式为DD。MM.YYYY,但在生成的数据帧中,它们看起来像是任意的数值。这是我正在使用的代码:
library(readxl)
df <- read_excel("path/to/Fallzahlen_Kum_Tab_Archiv.xlsx",
sheet = "BL_7-Tage-Fallzahlen (fixiert)")
head(df)
返回以下输出:
# A tibble: 6 x 494
`Fallzahlen in d~ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...14 ...15
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4 NA 43957 43958 43959 43960 43961 43962 43963 43964 43965 43966 43967 43968 43969 43970
5 Baden-Württemberg 908 779 665 706 772 760 774 744 765 725 622 551 549 483
6 Bayern 1318 1246 1182 1223 1148 1151 1198 1194 1115 1022 1022 997 982 921
# ... with 479 more variables: ...16 <dbl>, ...
在第4行中,您可以看到明显未正确导入的日期。我已经尝试了col_types="text"
和col_types="list"
,正如其他帖子中所建议的那样,但这似乎没有改变任何事情。
excel文件可以在";Archiv";RKI网页的部分,这应该是直接下载链接。
您可以在将数据帧加载到R 后修复数据帧
# load data, replace ... with path and file
# dat <- read_xlsx(..., sheet="BL_7-Tage-Fallzahlen (fixiert)")
# get the dates
dates <- as.numeric(dat[4,])
# filter top rows with NAs
dat <- dat[5:nrow(dat),]
# put the dates as column names
colnames(dat)[-1] <- as.character(as.Date(dates[-1], origin="1899-12-30"))
head(dat)
# A tibble: 6 × 494
`Fallzahlen in den letzte… `2020-05-06` `2020-05-07` `2020-05-08` `2020-05-09`
<chr> <dbl> <dbl> <dbl> <dbl>
1 Baden-Württemberg 908 779 665 706
2 Bayern 1318 1246 1182 1223
3 Berlin 276 265 272 280
4 Brandenburg 125 116 117 131
5 Bremen 119 126 140 155
6 Hamburg 95 85 86 89
# … with 489 more variables: 2020-05-10 <dbl>, 2020-05-11 <dbl>,
# 2020-05-12 <dbl>, 2020-05-13 <dbl>, 2020-05-14 <dbl>, 2020-05-15 <dbl>,
# 2020-05-16 <dbl>, 2020-05-17 <dbl>, 2020-05-18 <dbl>, 2020-05-19 <dbl>,
# 2020-05-20 <dbl>, 2020-05-21 <dbl>, 2020-05-22 <dbl>, 2020-05-23 <dbl>,
# 2020-05-24 <dbl>, 2020-05-25 <dbl>, 2020-05-26 <dbl>, 2020-05-27 <dbl>,
# 2020-05-28 <dbl>, 2020-05-29 <dbl>, 2020-05-30 <dbl>, 2020-05-31 <dbl>,
# 2020-06-01 <dbl>, 2020-06-02 <dbl>, 2020-06-03 <dbl>, 2020-06-04 <dbl>, …