r语言 - fromJSON %>% as.data.frame 在选定数据上出现多个级别失败



我有一长串JSON字符串,正试图导入并转换为数据帧。一般来说,jsonlite::fromJSON正常工作,但大约25%的JSON抛出错误:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 47, 2

我意识到这是因为JSON文件中的级别不同,但对于我的其他75%具有类似结构的数据来说,这似乎没有问题。

这里有一个例子来说明它什么时候起作用,什么时候不起作用。

工作不正常

x1 <- '{"productCode":"DP1.00096.001","sites":["ABBY","BARR","BART","BLAN","BONA","CLBJ","CPER","DCFS","DEJU","DELA","DSNY","GRSM","GUAN","HARV","HEAL","JERC","JORN","KONA","KONZ","LAJA","LENO","MLBS","MOAB","NIWO","NOGP","OAES","ONAQ","ORNL","OSBS","PUUM","RMNP","SCBI","SERC","SJER","SOAP","SRER","STEI","STER","TALL","TEAK","TOOL","TREE","UKFS","UNDE","WOOD","WREF","YELL"],"dateRange":["2012-06","2018-07"],"documentation":"include","packageType":"basic"}'
output1 <- jsonlite::fromJSON(x1)
str(output1)
as.data.frame(output1)

工作正常

x2 <- '{"productCode":"DP1.00095.001","sites":["ABBY","BARR","BART","BLAN","BONA","CLBJ","CPER","DCFS","DEJU","DELA","DSNY","GRSM","GUAN","HARV","HEAL","JERC","JORN","KONA","KONZ","LAJA","LENO","MLBS","MOAB","NIWO","NOGP","OAES","ONAQ","ORNL","OSBS","RMNP","SCBI","SERC","SJER","SOAP","SRER","STEI","STER","TALL","TEAK","TOOL","TREE","UKFS","UNDE","WOOD","WREF","YELL"],"dateRange":["2019-01","2019-12"],"documentation":"include","packageType":"basic"}'
output2 <- jsonlite::fromJSON(x2)
str(output2)
as.data.frame(output2)

在这两个例子中,JSON都有一个不平衡的结构。不同之处在于,在x2中,JSON能够转换为data.frame,而在x1中,JSON输出则不能。我在str输出和实际的JSON字符串中都找不到区别,无法理解它在哪里以及为什么失败。结构完全相同,我不认为这次操作会失败。

如果您能帮助我们了解为什么x1不起作用,而x2起作用,我们将不胜感激。是否存在将x1放入类似于x2输出的工作数据帧的方法?

这与json本身无关。这是因为您正在尝试回收奇数和偶数长度向量。例如:

as.data.frame(list(a = 1, b = 1:2, c = 3:5))
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, 
#> : arguments imply differing number of rows: 1, 2, 3
as.data.frame(list(a = 1, b = 1:2, c = 3:6))
#>   a b c
#> 1 1 1 3
#> 2 1 2 4
#> 3 1 1 5
#> 4 1 2 6

output1的情况下,解决问题的方法是在output1$sites:的末尾添加一个随机字符串

output1$sites <- c(output1$sites, "")
head(as.data.frame(output1))
#>     productCode sites dateRange documentation packageType
#> 1 DP1.00096.001  ABBY   2012-06       include       basic
#> 2 DP1.00096.001  BARR   2018-07       include       basic
#> 3 DP1.00096.001  BART   2012-06       include       basic
#> 4 DP1.00096.001  BLAN   2018-07       include       basic
#> 5 DP1.00096.001  BONA   2012-06       include       basic
#> 6 DP1.00096.001  CLBJ   2018-07       include       basic

以这种格式保存数据是否真的有意义,这个问题根本不清楚。

由reprex包(v0.3.0(于2020-08-05创建

相关内容

  • 没有找到相关文章

最新更新