我正在尝试将一个格式正确的json文件保存到aws3。
我可以使用例如将常规数据帧保存到s3
library(tidyverse)
library(aws.s3)
s3save(mtcars, bucket = "s3://ourco-emr/", object = "tables/adhoc.db/mtcars/mtcars")
但我需要将mtcars转换为json格式。特别是ndjson。
我能够创建一个格式正确的json文件,例如:
predictions_file <- file("mtcars.json")
jsonlite::stream_out(mtcars), predictions_file)
这会将一个名为mtcars.json.的文件保存到我的目录中
但是,使用aws.s3函数s3save()
,我需要发送内存中的对象,而不是文件。
尝试:
predictions_file <- file("mtcars.json")
s3write_using(mtcars,
FUN = jsonlite::stream_out,
con = predictions_file,
"s3://ourco-emr/",
object = "tables/adhoc.db/mtcars/mtcars")
提供:
if(verbose(消息中的错误("opening",is(con(,"output connection."(:自变量不能解释为逻辑
我尝试了相同的代码块,但省略了con=predictions_file的行,只给出了:
参数con必须是一个连接。
如果函数jsonlite::stream_out()
创建了一个格式正确的json文件,那么我如何将该文件写入s3?
编辑:所需的json输出如下所示:
{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3,"wt":2,"qsec":16,"vs":0,"am":1,"gear":4,"carb":4,"year":"2020","month":"03","day":"05"}
{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3,"wt":2,"qsec":17,"vs":0,"am":1,"gear":4,"carb":4,"year":"2020","month":"03","day":"05"}
{"mpg":22,"cyl":4,"disp":108,"hp":93,"drat":35,"wt":2,"qsec":18,"vs":1,"am":1,"gear":4,"carb":1,"year":"2020","month":"03","day":"05"}
{"mpg":21,"cyl":6,"disp":258,"hp":110,"drat":8,"wt":3,"qsec":19,"vs":1,"am":0,"gear":3,"carb":1,"year":"2020","month":"03","day":"05"}
{"mpg":18,"cyl":8,"disp":360,"hp":175,"drat":3,"wt":3,"qsec":17,"vs":0,"am":0,"gear":3,"carb":2,"year":"2020","month":"03","day":"05"}
尝试使用readchar:时
mtcars_string <- readChar("mtcars.json", 1e6)
s3save(mtcars_string, bucket = "s3://ourco-emr/", object = "tables/adhoc.db/mtcars/2020/03/06/mtcars")
如果我下载并打开生成的json文件,它看起来像这样:
5244 5833 0a58 0a00 0000 0300 0306 0000
0305 0000 0000 0555 5446 2d38 0000 0402
0000 0001 0004 0009 0000 000d 6d74 6361
7273 5f73 7472 696e 6700 0000 1000 0000
0100 0400 0900 0012 347b 226d 7067 223a
3231 2c22 6379 6c22 3a36 2c22 6469 7370
因此,与json 相比,tsb似乎已被发送到aws s3
我也遇到了同样的问题。我需要编写JSON行(ndjson(并将其上传到S3,据我所知,只有jsonlite
包中的stream_out()
编写JSON行。
stream_out()
只将连接对象作为目的地,而s3write_using()
则写入临时文件tmp
,并将该文件的路径作为字符串传递给FUN
。stream_out()
然后抛出错误:
参数con必须是一个连接。
一个临时修复方案是修改s3write_using()
,以传递到FUN
的连接,而不是文件路径字符串。
-
trace(s3write_using, edit=TRUE)
-打开编辑器 -
更改第5行:
value <- FUN(x, tmp, ...)
对此:
value <- FUN(x, file(tmp), ...)
然后可以使用stream_out()
:上传数据
s3write_using(x = data,
FUN = stream_out,
bucket = 'mybucket',
object = 'my/object.json',
opts = list(acl = "private", multipart = FALSE, verbose = T, show_progress = T))
编辑将保留整个会话或直到执行untrace(s3write_using)
。
人们可能应该在cloudyr/aws.s3 GitHub中提交一个请求,因为这是一个常见的用例。