我有一个数据帧data2
,其中包含不同格式的变量x.date
:
[1] "42250" "14/03/2015" "14/03/2015" "42311"
[5] "42158" "42158" "19/03/2015" "19/03/2015"
[9] "19/03/2015" "19/03/2015" "19/03/2015" "18/03/2015"
如果我尝试以下操作,我需要将其转换为dd/mm/yyyy格式:
as.Date(data2$x.date format = "%m/%d/%Y")
我收到NA:
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[20] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
解决这个问题的办法是什么?
如果只有这两种格式,您可以尝试
library(dplyr)
library(lubridate)
df %>%
mutate(date = format(coalesce(dmy(x.date),
as_date(as.integer(x.date), origin = "1900-01-01")),
"%d-%m-%Y")) %>%
suppressWarnings()
返回
col1 x.date date
1 a 42250 05-09-2015
2 b 14/03/2015 14-03-2015
3 c 14/03/2015 14-03-2015
4 d 42311 05-11-2015
5 e 42158 05-06-2015
6 f 42158 05-06-2015
7 g 19/03/2015 19-03-2015
8 h 19/03/2015 19-03-2015
9 i 19/03/2015 19-03-2015
10 j 19/03/2015 19-03-2015
11 k 19/03/2015 19-03-2015
12 l 18/03/2015 18-03-2015
注意:我不确定哪一个日期是42250格式的正确来源。这可能是1899/12/31或1899/12/30或1900/01/01。
数据
structure(list(col1 = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l"), x.date = c("42250", "14/03/2015", "14/03/2015",
"42311", "42158", "42158", "19/03/2015", "19/03/2015", "19/03/2015",
"19/03/2015", "19/03/2015", "18/03/2015")), class = "data.frame", row.names = c(NA,
-12L))
首先需要在Date类中获取数据,然后使用format
可以获得任何格式的数据。
这是一种基本的R方法-
#Create a column to save dates
data2$dates <- as.Date(NA)
#Find out dates which has only numbers
inds <- grepl('^\d+$', data2$x.date)
#Change excel date to R date
data2$dates[inds] <- as.Date(as.numeric(data2$x.date[inds]), origin = "1899-12-30")
#For remaining ones use dmy format to change the dates
data2$dates[!inds] <- as.Date(data2$x.date[!inds], '%d/%m/%Y')
#change the format of dates.
data2$dates <- format(data2$dates, '%d/%m/%Y')
#[1] "03/09/2015" "14/03/2015" "14/03/2015" "03/11/2015" "03/06/2015" "03/06/2015"
#[7] "19/03/2015" "19/03/2015" "19/03/2015" "19/03/2015" "19/03/2015" "18/03/2015"