r-将具有5位数字和日期的变量转换为日期值



我有以下数据,其中包含一些作为5位字符值的日期值。当我尝试转换为日期时,正确的日期将更改为NA值。

dt <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2),
Registrationdate=c('2019-01-09','2019-01-09','2019-01-09','2019-01-09','2019-01-09',
'2019-01-09',"44105","44105","44105","44105","44105"))

预期值

id Registrationdate
1   1       2019-01-09
2   1       2019-01-09
3   1       2019-01-09
4   1       2019-01-09
5   1       2019-01-09
6   1       2019-01-09
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01

我试过使用

library(openxlsx)
dt$Registrationdate <- convertToDate(dt$Registrationdate, origin = "1900-01-01")

但我有

1   1             <NA>
2   1             <NA>
3   1             <NA>
4   1             <NA>
5   1             <NA>
6   1             <NA>
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01

这里有一种混合使用dplyr和基本R:的方法

library(dplyr, warn = FALSE)

dt |> 
mutate(Registrationdate = if_else(grepl("-", Registrationdate), 
as.Date(Registrationdate),
openxlsx::convertToDate(Registrationdate, origin = "1900-01-01")))
#> Warning in openxlsx::convertToDate(Registrationdate, origin = "1900-01-01"): NAs
#> introduced by coercion
#>    id Registrationdate
#> 1   1       2019-01-09
#> 2   1       2019-01-09
#> 3   1       2019-01-09
#> 4   1       2019-01-09
#> 5   1       2019-01-09
#> 6   1       2019-01-09
#> 7   2       2020-10-01
#> 8   2       2020-10-01
#> 9   2       2020-10-01
#> 10  2       2020-10-01
#> 11  2       2020-10-01

创建于2022-10-15与reprex v2.0.2

library(janitor)
dt$Registrationdate <- convert_to_date(dt$Registrationdate)
id Registrationdate
1   1       2019-01-09
2   1       2019-01-09
3   1       2019-01-09
4   1       2019-01-09
5   1       2019-01-09
6   1       2019-01-09
7   2       2020-10-01
8   2       2020-10-01
9   2       2020-10-01
10  2       2020-10-01
11  2       2020-10-01

另一个选项是以预期格式导入列。openxlsx2的示例如下所示。上半部分创建一个文件,该文件会导致您在openxlsx中看到的行为。这是因为Registrationdate列中的一些行被格式化为日期,而另一些行则被格式化为字符串,这是由生成xlsx输入的人引起的一个相当常见的错误。

使用openxlsx2,您可以定义要导入的列的类型。该选项的灵感来自readxl(iirc(。

library(openxlsx2)
## prepare data
date_as_string <- data.frame(
id = rep(1, 6),
Registrationdate = rep('2019-01-09', 6)
)
date_as_date <- data.frame(
id = rep(2, 5),
Registrationdate = rep(as.Date('2019-01-10'), 5)
)
options(openxlsx2.dateFormat = "yyyy-mm-dd")
wb <- wb_workbook()$
add_worksheet()$
add_data(x = date_as_string)$
add_data(x = date_as_date, colNames = FALSE, startRow = 7)
#wb$open()
## read data as date
dt <- wb_to_df(wb, types = c(id = 1,  Registrationdate = 2))
## check that Registrationdate is actually a Date column
str(dt$Registrationdate)
#>  Date[1:10], format: "2019-01-09" "2019-01-09" "2019-01-09" "2019-01-09" "2019-01-09" ...

最新更新