as.numeric(as.POSIXct(x)) 只是偶尔工作



我目前正在进行回归不连续性设计中的调查。

我有单独的变量,分别表示调查开始的年、月、日、小时和分钟,调查完成的时间有年、月、日、小时和分钟。

通过paste(),我已将其折叠为starttimeendtime变量,两者都作为字符。 然后我使用as.POSIXct()让 R 知道变量中的字符是datetimes的,并且我使用正确的格式yyyy-mm-dd hh:mm

由于我需要日期作为数值,因为时间是设计中的自变量,因此我应用以下代码:

ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))

问题是,代码仅适用于ESSFR$starttime,而不适用于ESSFR$endtime。当应用于ESSFR$endtime时,我收到以下消息:

字符串不是标准的明确格式。

有谁知道为什么代码只偶尔对我有用?

以下是数据片段:

> dput(head(ESSFR[,582:591]))
structure(list(inwdds = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwmms = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwyys = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"), 
inwshh = structure(c(11, 11, 16, 18, 11, 17), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwsmm = structure(c(5, 49, 21, 36, 54, 21), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwdde = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwmme = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwyye = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"), 
inwehh = structure(c(12, 12, 18, 20, 13, 18), labels = structure(99, .Names = "Not available"), class = "labelled"), 
inwemm = structure(c(13, 59, 5, 0, 7, 45), labels = structure(99, .Names = "Not available"), class = "labelled")), .Names = c("inwdds", 
"inwmms", "inwyys", "inwshh", "inwsmm", "inwdde", "inwmme", "inwyye", 
"inwehh", "inwemm"), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

这是代码:

#Creating Dataframe only consisting of French answers
ESSFR <- ESSData %>%
filter(cntry == "FR")
#Collapsing the seperate time variables to one.
#The time variables are: 
#Start year = inwyys
#Start month = inwmms
#Start day = inwdds
#Start hour = inwshh
#Start minute = inwsmm
#End year = inwyye
#End month = inwmme
#End day = inwdde
#End hour = inwehh
#End minute = inwemm
#Collapsing starttime variable
ESSFR$startdate <- paste(ESSFR$inwyys,"-",ESSFR$inwmms,"-",ESSFR$inwdds, sep = "")
ESSFR$startdate
ESSFR$startdaytime <- paste(ESSFR$inwshh,":",ESSFR$inwsmm, sep = "")
ESSFR$startdaytime
ESSFR$starttime <- paste(ESSFR$startdate,ESSFR$startdaytime)
ESSFR$starttime
class(ESSFR$starttime) #string variable generated
#Collapsing endtime variable
ESSFR$enddate <- paste(ESSFR$inwyye,"-",ESSFR$inwmme,"-",ESSFR$inwdde, sep = "")
ESSFR$enddate
ESSFR$enddaytime <- paste(ESSFR$inwehh,":",ESSFR$inwemm, sep = "")
ESSFR$enddaytime
ESSFR$endtime <- paste(ESSFR$enddate,ESSFR$enddaytime)
ESSFR$endtime
class(ESSFR$endtime) #string variable generated
#Looking at the two variables
glimpse(ESSFR$starttime)
glimpse(ESSFR$endtime)
#Looking good
#Transforming the two time varibles from string to numerical variables.
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$starttime_secs
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
ESSFR$endtime_secs

这是数据和当前脚本的链接 https://wetransfer.com/downloads/cb528871a341c1b2118d5db9e03d16ee20180608103455/11ca2d

提前谢谢你。

可能您的某些结束时间是 NA 或空白。 如果它们在打印时看起来不错,那么它们可能大部分都没问题,但有一些不好的潜伏在某个地方。

您可以使用此代码一次处理一个条目,为错误的条目提供NA。 不要在生产中使用它,它很慢:

sapply(ESSFR$endtime_secs, 
function(x) 
tryCatch(as.POSIXct(x), error = function(x) NA))

例如

ESSFR <- list(endtime_secs = c("2018-06-07 11:00 AM", "bad"))
sapply(ESSFR$endtime_secs, 
function(x) 
tryCatch(as.POSIXct(x), error = function(x) NA))
#> 2018-06-07 11:00 AM                 bad 
#>          1528383600                  NA

您也可以使用strptime()并获取错误条目的NA,但随后您需要明确指定格式。

相关内容

最新更新