使用jq流,过滤大型json文件,并将输出保存为csv



我有一个非常大的json文件,我想流(使用--stream)和过滤器jq,然后保存为csv。

这是两个对象的示例数据:

[{"_id":"1","time":"2021-07-22","body":["text1"],"region":[{"percentage":"0.1","region":"region1"},{"percentage":"0.9","region":"region2"}],"reach":{"lower_bound":"100","upper_bound":"200"},"languages":["de"]},
{"_id":"2","time":"2021-07-23","body":["text2"],"region":[{"percentage":"0.3","region":"region1"},{"percentage":"0.7","region":"region2"}],"reach":{"lower_bound":"10","upper_bound":"20"},"languages":["en"]}]

我想在jq流中过滤"languages"字段,所以我只保留languages==[“de”]的对象,然后将其保存为标题为largefile.csv的新csv文件,以便新的csv文件看起来如下:

_id,time,body,percentage_region1,percentage_region2,reach_lower_bound,reach_upper_bound,languages
"1","2021-07-22","text1","0.1","0.9","100","200","de"

我有以下代码到目前为止,但它似乎不工作:

cat largefile.json -r | jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.))) | with_entries(select(.value.languages==[“de”])) | @csv

任何帮助将非常感激!

这里涉及到几个独立的任务,其中一些没有明确说明,但希望下面的任务能帮助您解决这个问题:

jq -rn --stream '
fromstream(1|truncate_stream(inputs))
| select( .languages == ["de"] ) 
| [._id, .time, .body[0], .region[].percentage,
.reach.lower_bound, .reach.upper_bound, .languages[0]]
| @csv 
'

相关内容

  • 没有找到相关文章

最新更新