尝试将下面的json数据转换为csv,使用jq和/或awk或python或perl或Linux shell中的任何内容。 感谢您在此处的脚本帮助。
{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}
csv 输出应如下所示:
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
etc
请参阅随附的图片
CSV 输出 等
将不胜感激这里的任何建议。 我尝试使用jq和awk,但没有得到正确的结果。
下面是一个使用 GNU awk 的 JSON 扩展:
$ gawk '
@load "json"
BEGIN {
OFS=","
}
{
lines=lines $0 # keep appending lines
if(json_fromJSON(lines,data)!=0) { # until you have a valid object
for(inventory in data["inventory"]) # then we iterate the arrays and output
for(invState in data["inventory"][inventory]["invState"])
print data["inventory"][inventory]["profile"],
data["inventory"][inventory]["invState"][invState]["count"],
data["inventory"][inventory]["invState"][invState]["Status"]
lines="" # reset the object array for next round
}
}' file.json
部分输出:
Earth,6,ONLINE
Earth,8,EXIST
...
Mars,7,EXIST
Mars,20,INSTOCK
...
Mercury,4,EXIST
Mercury,1224,INSTOCK
...
如果使用 -r 选项调用 jq,则以下内容将生成如下所示的输出:
.inventory[]
| .profile as $profile
| .invState[]
| [$profile] + [.count, .Status]
| join(",")
但是请注意,如果需要 CSV 输出,则最好将最后一行中的join
替换为@csv
。
较短的版本
以下内容等效于上述内容:
.inventory[]
| [.profile] + (.invState[] | [.count, .Status])
| join(",")
如果"计数"和"状态"键的顺序是固定的,则可以侥幸逃脱:
.inventory[]
| [.profile] + (.invState[] | [.[]])
| join(",")
我用 JSON 文件生成了不少 CSV 文件,jq
.我觉得jq
很适合这个。
脱离阵列
从:
["Earth","Mars","Mercury"]
自:
"Earth"
"Mars"
"Mercury"
使用此过滤器实现:.[]
,它遍历数组。正如文档所说:
使用输入 [1,2,3] 运行 .[] 将生成三个单独的结果,而不是单个数组。
从字符串到文本
从:
["Earth","Mars","Mercury"]
自:
Earth
Mars
Mercury
通过 CLI 上的--raw-output
参数与.[]
过滤器相结合来实现。 例如,
jq --raw-output '.[]' input.json
保存到变量
在处理其余库存时,您需要保留对.profile
的引用。下面是一个人为的例子:
从:
[ {"x": "Earth", "y": ["1", "2", "3"]}
, {"x": "Mars", "y": ["1", "2", "3"]}
]
自:
"Earth1"
"Earth2"
"Earth3"
"Mars1"
"Mars2"
"Mars3"
使用此过滤器实现:.[] | .x as $x | .y[] | $x + .
(您可以将.x
保存到可以在过滤器中引用的 var$x
中。
回答您的问题,鉴于以下input.json
:
{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}
以下对jq
的调用应该可以解决问题:
jq --raw-output '.inventory[] | .profile as $p | .invState[] | "($p),(.count),(.Status)"' input.json
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
如果你没有 jq 或 gawks json 扩展(需要 gawkextlib),并且你的输入总是像你的例子一样简单和规则,那么这将使用 GNU awk 为第三个 arg 匹配 () 和 gensub() 做你想要的:
$ cat tst.awk
BEGIN { OFS="," }
match($0,/"([^"]+)": *("[^"]*"|[0-9]+)/,a) {
tag = a[1]
val = gensub(/^"|"$/,"","g",a[2])
f[tag] = val
if ( tag == "Status" ) {
print f["profile"], f["count"], f["Status"]
}
}
$ awk -f tst.awk file
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
这可能对你有用(GNU sed):
sed -nE '/profile/{s/.*"(S+)".*/1/;h};
/count/{s/.* (S+),.*/1/;H};
/Status/{s/.*"(S+)".*/1/;H;g;s/n/,/gp;g;s/n.*n.*//;h}' file
将profile
、count
和Status
信息塞入保留空间,然后在这样做后Status
,检索保留空间,用逗号替换换行符,打印然后删除count
和Status
详细信息,为下一次做好准备。
注:注:由于这是 json,因此最好使用jq
因为这始终是一个更强大的解决方案。
awk -F: 'BEGIN{ OFS=""; p=c=s=""; }
/"profile"/{ p=$2 }
/"count"/{ c=$2 }
/"Status"/{ s=$2 }
{ if(s!="") { print p,c,s; s="" }}' file.json
输出:
"Earth", 6, "ONLINE"
"Earth", 8, "EXIST"
"Earth", 1, "GIVEN"
"Earth", 4, "ERROR"
"Earth", 49, "INSTOCK"
"Earth", 389, "RELEASED"
"Earth", 68, "DELETED"
"Earth", 280, "CONNECTED"
"Earth", 1, "UNINSTOCK"
"Mars", 7, "EXIST"
"Mars", 20, "INSTOCK"
"Mars", 110, "RELEASED"
"Mars", 16, "DELETED"
"Mars", 41, "CONNECTED"
"Mars", 1, "UNINSTOCK"
"Mercury", 4, "EXIST"
"Mercury", 1224, "INSTOCK"
"Mercury", 3, "CONNECTED"
"Mercury", 18, "RELEASED"
"Mercury", 5, "DELETED"
正是csv,为什么文本字段被双引号包围。
如果你的 json 不在"pretty_print"中,你可能必须执行以下操作:
cat file.json | json_pp | awk .....