对R或R格式日期并不陌生,不会问这个问题,但我有严重的奇怪行为,在过去的2小时内没有接近解决它。
我有一个数据集,我已经导入,并希望使用as.POSIXct
格式化日期/时间列。日期是一个非标准格式,我已经应用了我所知道的正确格式。这是我遇到麻烦的一小部分数据。代码紧随其后。问题是有4个NA从"2015-03-08 02:00:00 PST"
开始。到底发生了什么事?这似乎完全是随机的,因为在其他55K次观测中没有发生过这种情况。
bad.Dates<-c("3/7/2015 14:15", "3/7/2015 14:30", "3/7/2015 14:45", "3/7/2015 15:00",
"3/7/2015 15:15", "3/7/2015 15:30", "3/7/2015 15:45", "3/7/2015 16:00",
"3/7/2015 16:15", "3/7/2015 16:30", "3/7/2015 16:45", "3/7/2015 17:00",
"3/7/2015 17:15", "3/7/2015 17:30", "3/7/2015 17:45", "3/7/2015 18:00",
"3/7/2015 18:15", "3/7/2015 18:30", "3/7/2015 18:45", "3/7/2015 19:00",
"3/7/2015 19:15", "3/7/2015 19:30", "3/7/2015 19:45", "3/7/2015 20:00",
"3/7/2015 20:15", "3/7/2015 20:30", "3/7/2015 20:45", "3/7/2015 21:00",
"3/7/2015 21:15", "3/7/2015 21:30", "3/7/2015 21:45", "3/7/2015 22:00",
"3/7/2015 22:15", "3/7/2015 22:30", "3/7/2015 22:45", "3/7/2015 23:00",
"3/7/2015 23:15", "3/7/2015 23:30", "3/7/2015 23:45", "3/8/2015 0:00",
"3/8/2015 0:15", "3/8/2015 0:30", "3/8/2015 0:45", "3/8/2015 1:00",
"3/8/2015 1:15", "3/8/2015 1:30", "3/8/2015 1:45", "3/8/2015 2:00",
"3/8/2015 2:15", "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00",
"3/8/2015 3:15", "3/8/2015 3:30", "3/8/2015 3:45", "3/8/2015 4:00",
"3/8/2015 4:15", "3/8/2015 4:30", "3/8/2015 4:45", "3/8/2015 5:00",
"3/8/2015 5:15", "3/8/2015 5:30", "3/8/2015 5:45", "3/8/2015 6:00",
"3/8/2015 6:15", "3/8/2015 6:30", "3/8/2015 6:45", "3/8/2015 7:00",
"3/8/2015 7:15", "3/8/2015 7:30", "3/8/2015 7:45", "3/8/2015 8:00",
"3/8/2015 8:15", "3/8/2015 8:30", "3/8/2015 8:45", "3/8/2015 9:00",
"3/8/2015 9:15", "3/8/2015 9:30", "3/8/2015 9:45", "3/8/2015 10:00",
"3/8/2015 10:15", "3/8/2015 10:30", "3/8/2015 10:45", "3/8/2015 11:00",
"3/8/2015 11:15", "3/8/2015 11:30", "3/8/2015 11:45", "3/8/2015 12:00",
"3/8/2015 12:15", "3/8/2015 12:30", "3/8/2015 12:45", "3/8/2015 13:00",
"3/8/2015 13:15", "3/8/2015 13:30", "3/8/2015 13:45", "3/8/2015 14:00",
"3/8/2015 14:15", "3/8/2015 14:30", "3/8/2015 14:45", "3/8/2015 15:00",
"3/8/2015 15:15")
as.POSIXct(strptime(bad.Dates,"%m/%d/%Y %H:%M"))
要使这个示例在任何位置都可重现/可解决,请显式地通过tz=
指定时区:
bad.Dates <- c("3/8/2015 1:45", "3/8/2015 2:00", "3/8/2015 2:15",
"3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00")
as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="US/Pacific")
#[1] "2015-03-08 01:45:00 PST"
#[2] NA
#[3] NA
#[4] NA
#[5] NA
#[6] "2015-03-08 03:00:00 PDT"
你得到NA
s,因为这些时间在美国太平洋地区的现代计时中不存在。
美国、加拿大和墨西哥北部的大部分边境城市美国将于2015年3月8日星期日开始实行夏令时。人在遵守夏令时的地区,将从凌晨2点开始提前一小时(02:00)至凌晨3:00(03:00),当地时间。
来源:http://www.timeanddate.com/news/time/usa -加拿大-开始- dst - 2015. - html
指定一个时区,如"UTC"
,不观察夏令时将解决这个问题。
as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="UTC")
#[1] "2015-03-08 01:45:00 UTC"
#[2] "2015-03-08 02:00:00 UTC"
#[3] "2015-03-08 02:15:00 UTC"
#[4] "2015-03-08 02:30:00 UTC"
#[5] "2015-03-08 02:45:00 UTC"
#[6] "2015-03-08 03:00:00 UTC"