从 json 文件中提取数据并以 csv 格式设置结果格式



尝试将下面的json数据转换为csv,使用jq和/或awk或python或perl或Linux shell中的任何内容。 感谢您在此处的脚本帮助。

{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}

csv 输出应如下所示:

Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
etc

请参阅随附的图片

CSV 输出 等

将不胜感激这里的任何建议。 我尝试使用jq和awk,但没有得到正确的结果。

下面是一个使用 GNU awk 的 JSON 扩展:

$ gawk '
@load "json"
BEGIN {
OFS=","
}
{
lines=lines $0                           # keep appending lines
if(json_fromJSON(lines,data)!=0) {       # until you have a valid object
for(inventory in data["inventory"])  # then we iterate the arrays and output
for(invState in data["inventory"][inventory]["invState"])
print data["inventory"][inventory]["profile"],
data["inventory"][inventory]["invState"][invState]["count"],
data["inventory"][inventory]["invState"][invState]["Status"]
lines=""                             # reset the object array for next round
}
}' file.json

部分输出:

Earth,6,ONLINE
Earth,8,EXIST
...
Mars,7,EXIST
Mars,20,INSTOCK
...
Mercury,4,EXIST
Mercury,1224,INSTOCK
...

如果使用 -r 选项调用 jq,则以下内容将生成如下所示的输出:

.inventory[]
| .profile as $profile
| .invState[]
| [$profile] + [.count, .Status]
| join(",")

但是请注意,如果需要 CSV 输出,则最好将最后一行中的join替换为@csv

较短的版本

以下内容等效于上述内容:

.inventory[]
| [.profile] + (.invState[] | [.count, .Status])
| join(",")

如果"计数"和"状态"键的顺序是固定的,则可以侥幸逃脱:

.inventory[] 
| [.profile] + (.invState[] | [.[]])
| join(",")

我用 JSON 文件生成了不少 CSV 文件,jq.我觉得jq很适合这个。

脱离阵列

从:

["Earth","Mars","Mercury"]

自:

"Earth"
"Mars"
"Mercury"

使用此过滤器实现:.[],它遍历数组。正如文档所说:

使用输入 [1,2,3] 运行 .[] 将生成三个单独的结果,而不是单个数组。

从字符串到文本

从:

["Earth","Mars","Mercury"]

自:

Earth
Mars
Mercury

通过 CLI 上的--raw-output参数与.[]过滤器相结合来实现。 例如,

jq --raw-output '.[]' input.json

保存到变量

在处理其余库存时,您需要保留对.profile的引用。下面是一个人为的例子:

从:

[ {"x": "Earth", "y": ["1", "2", "3"]}
, {"x": "Mars", "y": ["1", "2", "3"]}
]

自:

"Earth1"
"Earth2"
"Earth3"
"Mars1"
"Mars2"
"Mars3"

使用此过滤器实现:.[] | .x as $x | .y[] | $x + .

(您可以将.x保存到可以在过滤器中引用的 var$x中。


回答您的问题,鉴于以下input.json

{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}

以下对jq的调用应该可以解决问题:

jq --raw-output '.inventory[] | .profile as $p | .invState[] | "($p),(.count),(.Status)"' input.json
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED

如果你没有 jq 或 gawks json 扩展(需要 gawkextlib),并且你的输入总是像你的例子一样简单和规则,那么这将使用 GNU awk 为第三个 arg 匹配 () 和 gensub() 做你想要的:

$ cat tst.awk
BEGIN { OFS="," }
match($0,/"([^"]+)": *("[^"]*"|[0-9]+)/,a) {
tag = a[1]
val = gensub(/^"|"$/,"","g",a[2])
f[tag] = val
if ( tag == "Status" ) {
print f["profile"], f["count"], f["Status"]
}
}

$ awk -f tst.awk file
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED

这可能对你有用(GNU sed):

sed -nE '/profile/{s/.*"(S+)".*/1/;h};
/count/{s/.* (S+),.*/1/;H};
/Status/{s/.*"(S+)".*/1/;H;g;s/n/,/gp;g;s/n.*n.*//;h}' file

profilecountStatus信息塞入保留空间,然后在这样做后Status,检索保留空间,用逗号替换换行符,打印然后删除countStatus详细信息,为下一次做好准备。

注:注:由于这是 json,因此最好使用jq因为这始终是一个更强大的解决方案。

awk -F: 'BEGIN{ OFS=""; p=c=s=""; }
/"profile"/{ p=$2 }
/"count"/{ c=$2 }
/"Status"/{ s=$2 }
{ if(s!="") { print p,c,s; s="" }}'  file.json

输出:

"Earth", 6, "ONLINE"
"Earth", 8, "EXIST"
"Earth", 1, "GIVEN"
"Earth", 4, "ERROR"
"Earth", 49, "INSTOCK"
"Earth", 389, "RELEASED"
"Earth", 68, "DELETED"
"Earth", 280, "CONNECTED"
"Earth", 1, "UNINSTOCK"
"Mars", 7, "EXIST"
"Mars", 20, "INSTOCK"
"Mars", 110, "RELEASED"
"Mars", 16, "DELETED"
"Mars", 41, "CONNECTED"
"Mars", 1, "UNINSTOCK"
"Mercury", 4, "EXIST"
"Mercury", 1224, "INSTOCK"
"Mercury", 3, "CONNECTED"
"Mercury", 18, "RELEASED"
"Mercury", 5, "DELETED"

正是csv,为什么文本字段被双引号包围。

如果你的 json 不在"pretty_print"中,你可能必须执行以下操作:

cat file.json | json_pp | awk .....

最新更新