我目前正在进行回归不连续性设计中的调查。
我有单独的变量,分别表示调查开始的年、月、日、小时和分钟,调查完成的时间有年、月、日、小时和分钟。
通过paste()
,我已将其折叠为starttime
和endtime
变量,两者都作为字符。 然后我使用as.POSIXct()
让 R 知道变量中的字符是datetimes
的,并且我使用正确的格式yyyy-mm-dd hh:mm
。
由于我需要日期作为数值,因为时间是设计中的自变量,因此我应用以下代码:
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
问题是,代码仅适用于ESSFR$starttime
,而不适用于ESSFR$endtime
。当应用于ESSFR$endtime
时,我收到以下消息:
字符串不是标准的明确格式。
有谁知道为什么代码只偶尔对我有用?
以下是数据片段:
> dput(head(ESSFR[,582:591]))
structure(list(inwdds = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwmms = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwyys = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"),
inwshh = structure(c(11, 11, 16, 18, 11, 17), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwsmm = structure(c(5, 49, 21, 36, 54, 21), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwdde = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwmme = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwyye = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"),
inwehh = structure(c(12, 12, 18, 20, 13, 18), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwemm = structure(c(13, 59, 5, 0, 7, 45), labels = structure(99, .Names = "Not available"), class = "labelled")), .Names = c("inwdds",
"inwmms", "inwyys", "inwshh", "inwsmm", "inwdde", "inwmme", "inwyye",
"inwehh", "inwemm"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
这是代码:
#Creating Dataframe only consisting of French answers
ESSFR <- ESSData %>%
filter(cntry == "FR")
#Collapsing the seperate time variables to one.
#The time variables are:
#Start year = inwyys
#Start month = inwmms
#Start day = inwdds
#Start hour = inwshh
#Start minute = inwsmm
#End year = inwyye
#End month = inwmme
#End day = inwdde
#End hour = inwehh
#End minute = inwemm
#Collapsing starttime variable
ESSFR$startdate <- paste(ESSFR$inwyys,"-",ESSFR$inwmms,"-",ESSFR$inwdds, sep = "")
ESSFR$startdate
ESSFR$startdaytime <- paste(ESSFR$inwshh,":",ESSFR$inwsmm, sep = "")
ESSFR$startdaytime
ESSFR$starttime <- paste(ESSFR$startdate,ESSFR$startdaytime)
ESSFR$starttime
class(ESSFR$starttime) #string variable generated
#Collapsing endtime variable
ESSFR$enddate <- paste(ESSFR$inwyye,"-",ESSFR$inwmme,"-",ESSFR$inwdde, sep = "")
ESSFR$enddate
ESSFR$enddaytime <- paste(ESSFR$inwehh,":",ESSFR$inwemm, sep = "")
ESSFR$enddaytime
ESSFR$endtime <- paste(ESSFR$enddate,ESSFR$enddaytime)
ESSFR$endtime
class(ESSFR$endtime) #string variable generated
#Looking at the two variables
glimpse(ESSFR$starttime)
glimpse(ESSFR$endtime)
#Looking good
#Transforming the two time varibles from string to numerical variables.
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$starttime_secs
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
ESSFR$endtime_secs
这是数据和当前脚本的链接 https://wetransfer.com/downloads/cb528871a341c1b2118d5db9e03d16ee20180608103455/11ca2d
提前谢谢你。
可能您的某些结束时间是 NA 或空白。 如果它们在打印时看起来不错,那么它们可能大部分都没问题,但有一些不好的潜伏在某个地方。
您可以使用此代码一次处理一个条目,为错误的条目提供NA
。 不要在生产中使用它,它很慢:
sapply(ESSFR$endtime_secs,
function(x)
tryCatch(as.POSIXct(x), error = function(x) NA))
例如
ESSFR <- list(endtime_secs = c("2018-06-07 11:00 AM", "bad"))
sapply(ESSFR$endtime_secs,
function(x)
tryCatch(as.POSIXct(x), error = function(x) NA))
#> 2018-06-07 11:00 AM bad
#> 1528383600 NA
您也可以使用strptime()
并获取错误条目的NA
,但随后您需要明确指定格式。