拆分/切片大型 JSON 免费排序 按几列独一无二并使用 jq 添加其他元素



使用jq,通过拆分/切片大型JSON,我们能够成功地根据数组大小将巨大的输入文件切片为较小的数据块。。

我想添加一个新的json元素,根据原始数组的长度递增序列号,并添加filter/unique每几列。

输入:

{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}

预期输出:添加附加密钥后

{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}

在执行过滤器(按州、市、邮政(和2 的每个阵列大小的切片之后

{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}

以下样本用于过滤/由少数列唯一,未达到最佳性能

input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | ("t", $object + {add:.} )) ' | awk ' /^t/ {fn++; next} { print >> "part-" fn ".json"}'

可以使用

.add |= [ range(length) as $i | .[$i] | .rownum = $i+1 ]

jqplay 演示

.add |= ( to_entries | map( .value.rownum = .key+1 | .value ) )

jqplay 演示

这里有一个使用两个通用过滤器的解决方案,一个用于枚举,另一个用于unique_by:的无排序和面向流的变体

# counting from 1
def enumerate(s; $key): foreach s as $x (0; .+1; {($key): .} + $x);
# emits a stream of the first item, $x, in the stream for which f assumes the value ($x|f).
def uniques_by(stream; f): 
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| if .[$t] | has($y) then . else .[$t][$y] = $x end )
| .[][] ;
.add |= [enumerate(uniques_by(.[]; {city,state,postal}); "rownum")]
| del(.add) as $object
| (.add|_nwise($size) | ("t", $object + {add:.} ))

最新更新