我有一长串JSON字符串,正试图导入并转换为数据帧。一般来说,jsonlite::fromJSON
正常工作,但大约25%的JSON抛出错误:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 47, 2
我意识到这是因为JSON文件中的级别不同,但对于我的其他75%具有类似结构的数据来说,这似乎没有问题。
这里有一个例子来说明它什么时候起作用,什么时候不起作用。
工作不正常
x1 <- '{"productCode":"DP1.00096.001","sites":["ABBY","BARR","BART","BLAN","BONA","CLBJ","CPER","DCFS","DEJU","DELA","DSNY","GRSM","GUAN","HARV","HEAL","JERC","JORN","KONA","KONZ","LAJA","LENO","MLBS","MOAB","NIWO","NOGP","OAES","ONAQ","ORNL","OSBS","PUUM","RMNP","SCBI","SERC","SJER","SOAP","SRER","STEI","STER","TALL","TEAK","TOOL","TREE","UKFS","UNDE","WOOD","WREF","YELL"],"dateRange":["2012-06","2018-07"],"documentation":"include","packageType":"basic"}'
output1 <- jsonlite::fromJSON(x1)
str(output1)
as.data.frame(output1)
工作正常
x2 <- '{"productCode":"DP1.00095.001","sites":["ABBY","BARR","BART","BLAN","BONA","CLBJ","CPER","DCFS","DEJU","DELA","DSNY","GRSM","GUAN","HARV","HEAL","JERC","JORN","KONA","KONZ","LAJA","LENO","MLBS","MOAB","NIWO","NOGP","OAES","ONAQ","ORNL","OSBS","RMNP","SCBI","SERC","SJER","SOAP","SRER","STEI","STER","TALL","TEAK","TOOL","TREE","UKFS","UNDE","WOOD","WREF","YELL"],"dateRange":["2019-01","2019-12"],"documentation":"include","packageType":"basic"}'
output2 <- jsonlite::fromJSON(x2)
str(output2)
as.data.frame(output2)
在这两个例子中,JSON都有一个不平衡的结构。不同之处在于,在x2
中,JSON能够转换为data.frame,而在x1
中,JSON输出则不能。我在str
输出和实际的JSON字符串中都找不到区别,无法理解它在哪里以及为什么失败。结构完全相同,我不认为这次操作会失败。
如果您能帮助我们了解为什么x1
不起作用,而x2
起作用,我们将不胜感激。是否存在将x1
放入类似于x2
输出的工作数据帧的方法?
这与json本身无关。这是因为您正在尝试回收奇数和偶数长度向量。例如:
as.data.frame(list(a = 1, b = 1:2, c = 3:5))
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
#> : arguments imply differing number of rows: 1, 2, 3
as.data.frame(list(a = 1, b = 1:2, c = 3:6))
#> a b c
#> 1 1 1 3
#> 2 1 2 4
#> 3 1 1 5
#> 4 1 2 6
在output1
的情况下,解决问题的方法是在output1$sites
:的末尾添加一个随机字符串
output1$sites <- c(output1$sites, "")
head(as.data.frame(output1))
#> productCode sites dateRange documentation packageType
#> 1 DP1.00096.001 ABBY 2012-06 include basic
#> 2 DP1.00096.001 BARR 2018-07 include basic
#> 3 DP1.00096.001 BART 2012-06 include basic
#> 4 DP1.00096.001 BLAN 2018-07 include basic
#> 5 DP1.00096.001 BONA 2012-06 include basic
#> 6 DP1.00096.001 CLBJ 2018-07 include basic
以这种格式保存数据是否真的有意义,这个问题根本不清楚。
由reprex包(v0.3.0(于2020-08-05创建