在转换json日期时引入r语言 - NAs



当通过API获取数据时,我在JSON "/Date(1386201600000)/"中得到以下日期格式。我尝试了revest、httr、jsonlite和rjson,但它们都返回这种日期格式。为了解决这个问题,我编写了convert_JSON_Date函数来将这些数据转换为可读的日期。这可以工作并返回正确的日期。

在整个日期列上使用该函数,我得到一条警告消息,NAs是通过强制引入的。我发现这与json日期格式的长度有关。有些是20个字符,有些是21个字符。在我的总数据集中,有更多不同的长度。当我将每个长度的数据分别放入函数时,一切都很好。

我不知道为什么会发生强制转换错误。我想知道是否有人能解释一下为什么会发生这种情况。

# Example data
t <- c("/Date(1184889600000)/", "/Date(1377648000000)/", "/Date(1386201600000)/",
       "/Date(1099353600000)/", "/Date(1403222400000)/", "/Date(1052092800000)/",
       "/Date(1324425600000)/", "/Date(1115942400000)/", "/Date(1343260800000)/",
       "/Date(940377600000)/", "/Date(1438819200000)/", "/Date(975715200000)/",
       "/Date(1125446400000)/", "/Date(1194566400000)/", "/Date(1331856000000)/",
       "/Date(1396569600000)/", "/Date(1346803200000)/", "/Date(1438560000000)/",
       "/Date(950832000000)/", "/Date(1380326400000)/", "/Date(1432771200000)/",
       "/Date(1436572800000)/", "/Date(1376438400000)/", "/Date(1428537600000)/",
       "/Date(869788800000)/", "/Date(1343001600000)/", "/Date(1382486400000)/",
       "/Date(1259539200000)/", "/Date(1427500800000)/", "/Date(1421971200000)/")
# converter for json dates. 
convert_JSON_Date <- function(Input_String){
      start <- stringi::stri_locate(Input_String, regex = "\(")[1,1]
      end <- stringi::stri_locate(Input_String, regex = "\)")[1,1]
      # shift 1 position from the start and end to get the string between the parentheses
      JSON_Date <- stringi::stri_sub(Input_String, start+1, end-1)
      # Not interested in time element. This is the time the data was uploaded to server
      JSON_Date <- as.Date(structure(as.numeric(JSON_Date)/1000, class = c("POSIXct", "POSIXt")))
      return(JSON_Date)
}

# NAs introduced by coercion
convert_JSON_Date(t)
# separetely it works
convert_JSON_Date(t[nchar(t) == 20])
# separetely it works
convert_JSON_Date(t[nchar(t) == 21])

您只获得第一个元素的开始和结束位置,该元素有21个字符。因此,对于有20个字符的元素,您将包括右括号,从而使as.numeric返回NA

您应该将其更改为提取这些值的整个列:

  start <- stringi::stri_locate(Input_String, regex = "\(")[,1]
  end <- stringi::stri_locate(Input_String, regex = "\)")[,1]

或者您也可以使用基函数来提取正确的值:

start.end <- regexpr("\d+",t)
as.numeric(substr(t, start.end, start.end + attr(start.end,"match.length")-1))/1000
 [1] 1184889600 1377648000 1386201600 1099353600 1403222400 1052092800
 [7] 1324425600 1115942400 1343260800  940377600 1438819200  975715200
[13] 1125446400 1194566400 1331856000 1396569600 1346803200 1438560000
[19]  950832000 1380326400 1432771200 1436572800 1376438400 1428537600
[25]  869788800 1343001600 1382486400 1259539200 1427500800 1421971200
[25]  869788800 1343001600 1382486400 1259539200 1427500800 1421971200

您可以通过RJSONIO::fromJSON中的日期处理程序R_json_dateStringOp在从字符串末尾删除三个零后运行这些日期。

library(RJSONIO)
## create the JSON string after removing three zeros at the end of each 't'
make <- toJSON(gsub("0{3}(?=\))", "", t, perl = TRUE))
## run it through fromJSON() with the date handler and collapse result to an atomic vector
do.call(c, fromJSON(make, stringFun = "R_json_dateStringOp"))
# [1] "2007-07-19 17:00:00 PDT" "2013-08-27 17:00:00 PDT" "2013-12-04 16:00:00 PST"
# [4] "2004-11-01 16:00:00 PST" "2014-06-19 17:00:00 PDT" "2003-05-04 17:00:00 PDT"
# [7] "2011-12-20 16:00:00 PST" "2005-05-12 17:00:00 PDT" "2012-07-25 17:00:00 PDT"
#[10] "1999-10-19 17:00:00 PDT" "2015-08-05 17:00:00 PDT" "2000-12-01 16:00:00 PST"
#[13] "2005-08-30 17:00:00 PDT" "2007-11-08 16:00:00 PST" "2012-03-15 17:00:00 PDT"
#[16] "2014-04-03 17:00:00 PDT" "2012-09-04 17:00:00 PDT" "2015-08-02 17:00:00 PDT"
#[19] "2000-02-17 16:00:00 PST" "2013-09-27 17:00:00 PDT" "2015-05-27 17:00:00 PDT"
#[22] "2015-07-10 17:00:00 PDT" "2013-08-13 17:00:00 PDT" "2015-04-08 17:00:00 PDT"
#[25] "1997-07-24 17:00:00 PDT" "2012-07-22 17:00:00 PDT" "2013-10-22 17:00:00 PDT"
#[28] "2009-11-29 16:00:00 PST" "2015-03-27 17:00:00 PDT" "2015-01-22 16:00:00 PST"

这一行:

stringi::stri_locate(Input_String, regex = "\)")[1,1]

将获得右括号的位置,它可以是位置19或20,这取决于每个日期字符串的长度。因为你的第一次约会时间比较长,所以你的end值是20。但是在start和20之间提取将返回较短日期的关闭)。实际上,较短的日期需要end值为19,而较长的日期需要20值。

无论如何,你只需要更多的通用正则表达式来解决这个问题:

as.numeric(stri_extract(t, regex = '\d+'))

将返回字符串中的所有数字,这是您最终想要的。

最新更新