将大型单个Json数组拆分为10K记录的多个Json数组



我有一个CSV文件,大约有500万条记录,我正试图使用json处理器jq将CSV文件数据转换为json数组。然而,我需要将csv转换为json数组(转换为单独的文件(,每个数组都有10K条记录,而不是下面示例中给出的一个有500万条记录的json数组文件。

如何通过shell脚本实现这一点?或者,我如何通过shellscipt将单个json数组转换为多个json数组,每个json数组在json文件中有10k条记录?

输入csv文件:

identifier,type,locale
91617676848,MSISDN,es_ES
91652560975,MSISDN,es_ES
91636563675,MSISDN,es_ES

csv到json的转换:

jq --slurp --raw-input --raw-output 
'split("n") | .[1:] | map(split(",")) |
map({"identifier": .[0],
"type": .[1],
"locale": .[2]})' 
sample.csv > out_new.json

单个Json阵列输出:

[
{
"identifier": "91617676848",
"type": "MSISDN",
"locale": "es_ES"
},
{
"identifier": "91652560975",
"type": "MSISDN",
"locale": "es_ES"
},
{
"identifier": "91636563675",
"type": "MSISDN",
"locale": "es_ES"
}
]

预期Json输出。

1.json  (having 10K json array records)
[
{
"identifier": "91617676848",
"type": "MSISDN",
"locale": "es_ES"
},
.
.
.
.
{
"identifier": "91652560975",
"type": "MSISDN",
"locale": "es_ES"
}
]

2.json (having 10K json array records)
[
{
"identifier": "91636563675",
"type": "MSISDN",
"locale": "es_ES"
},
.
.
.
.
{
"identifier": "91636563999",
"type": "MSISDN",
"locale": "es_ES"
}
]

安装"csvkit"以使用"csvjson"程序是值得的。(在OS X上使用Homebrew。(https://csvkit.readthedocs.io/en/latest/scripts/csvjson.html

$ csvjson -I sample.csv | jq
[
{
"identifier": "91617676848",
"type": "MSISDN",
"locale": "es_ES"
},
{
"identifier": "91652560975",
"type": "MSISDN",
"locale": "es_ES"
},
{
"identifier": "91636563675",
"type": "MSISDN",
"locale": "es_ES"
}
]

最新更新