我有以下数据,其中包含一些作为5位字符值的日期值。当我尝试转换为日期时,正确的日期将更改为NA
值。
dt <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2),
Registrationdate=c('2019-01-09','2019-01-09','2019-01-09','2019-01-09','2019-01-09',
'2019-01-09',"44105","44105","44105","44105","44105"))
预期值
id Registrationdate
1 1 2019-01-09
2 1 2019-01-09
3 1 2019-01-09
4 1 2019-01-09
5 1 2019-01-09
6 1 2019-01-09
7 2 2020-10-01
8 2 2020-10-01
9 2 2020-10-01
10 2 2020-10-01
11 2 2020-10-01
我试过使用
library(openxlsx)
dt$Registrationdate <- convertToDate(dt$Registrationdate, origin = "1900-01-01")
但我有
1 1 <NA>
2 1 <NA>
3 1 <NA>
4 1 <NA>
5 1 <NA>
6 1 <NA>
7 2 2020-10-01
8 2 2020-10-01
9 2 2020-10-01
10 2 2020-10-01
11 2 2020-10-01
这里有一种混合使用dplyr
和基本R:的方法
library(dplyr, warn = FALSE)
dt |>
mutate(Registrationdate = if_else(grepl("-", Registrationdate),
as.Date(Registrationdate),
openxlsx::convertToDate(Registrationdate, origin = "1900-01-01")))
#> Warning in openxlsx::convertToDate(Registrationdate, origin = "1900-01-01"): NAs
#> introduced by coercion
#> id Registrationdate
#> 1 1 2019-01-09
#> 2 1 2019-01-09
#> 3 1 2019-01-09
#> 4 1 2019-01-09
#> 5 1 2019-01-09
#> 6 1 2019-01-09
#> 7 2 2020-10-01
#> 8 2 2020-10-01
#> 9 2 2020-10-01
#> 10 2 2020-10-01
#> 11 2 2020-10-01
创建于2022-10-15与reprex v2.0.2
library(janitor)
dt$Registrationdate <- convert_to_date(dt$Registrationdate)
id Registrationdate
1 1 2019-01-09
2 1 2019-01-09
3 1 2019-01-09
4 1 2019-01-09
5 1 2019-01-09
6 1 2019-01-09
7 2 2020-10-01
8 2 2020-10-01
9 2 2020-10-01
10 2 2020-10-01
11 2 2020-10-01
另一个选项是以预期格式导入列。openxlsx2
的示例如下所示。上半部分创建一个文件,该文件会导致您在openxlsx
中看到的行为。这是因为Registrationdate
列中的一些行被格式化为日期,而另一些行则被格式化为字符串,这是由生成xlsx输入的人引起的一个相当常见的错误。
使用openxlsx2
,您可以定义要导入的列的类型。该选项的灵感来自readxl
(iirc(。
library(openxlsx2)
## prepare data
date_as_string <- data.frame(
id = rep(1, 6),
Registrationdate = rep('2019-01-09', 6)
)
date_as_date <- data.frame(
id = rep(2, 5),
Registrationdate = rep(as.Date('2019-01-10'), 5)
)
options(openxlsx2.dateFormat = "yyyy-mm-dd")
wb <- wb_workbook()$
add_worksheet()$
add_data(x = date_as_string)$
add_data(x = date_as_date, colNames = FALSE, startRow = 7)
#wb$open()
## read data as date
dt <- wb_to_df(wb, types = c(id = 1, Registrationdate = 2))
## check that Registrationdate is actually a Date column
str(dt$Registrationdate)
#> Date[1:10], format: "2019-01-09" "2019-01-09" "2019-01-09" "2019-01-09" "2019-01-09" ...