从 json 文件中提取数据并以 csv 格式设置结果格式

尝试将下面的json数据转换为csv，使用jq和/或awk或python或perl或Linux shell中的任何内容。感谢您在此处的脚本帮助。

{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}

csv 输出应如下所示：

Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
etc

请参阅随附的图片

CSV 输出等

将不胜感激这里的任何建议。我尝试使用jq和awk，但没有得到正确的结果。

下面是一个使用 GNU awk 的 JSON 扩展：

$ gawk '
@load "json"
BEGIN {
OFS=","
}
{
lines=lines $0                           # keep appending lines
if(json_fromJSON(lines,data)!=0) {       # until you have a valid object
for(inventory in data["inventory"])  # then we iterate the arrays and output
for(invState in data["inventory"][inventory]["invState"])
print data["inventory"][inventory]["profile"],
data["inventory"][inventory]["invState"][invState]["count"],
data["inventory"][inventory]["invState"][invState]["Status"]
lines=""                             # reset the object array for next round
}
}' file.json

部分输出：

Earth,6,ONLINE
Earth,8,EXIST
...
Mars,7,EXIST
Mars,20,INSTOCK
...
Mercury,4,EXIST
Mercury,1224,INSTOCK
...

如果使用 -r 选项调用 jq，则以下内容将生成如下所示的输出：

.inventory[]
| .profile as $profile
| .invState[]
| [$profile] + [.count, .Status]
| join(",")

但是请注意，如果需要 CSV 输出，则最好将最后一行中的join替换为@csv。

较短的版本

以下内容等效于上述内容：

.inventory[]
| [.profile] + (.invState[] | [.count, .Status])
| join(",")

如果"计数"和"状态"键的顺序是固定的，则可以侥幸逃脱：

.inventory[] 
| [.profile] + (.invState[] | [.[]])
| join(",")

我用 JSON 文件生成了不少 CSV 文件，jq.我觉得jq很适合这个。

脱离阵列

从：

["Earth","Mars","Mercury"]

自：

"Earth"
"Mars"
"Mercury"

使用此过滤器实现：.[]，它遍历数组。正如文档所说：

使用输入 [1,2,3] 运行 .[] 将生成三个单独的结果，而不是单个数组。

从字符串到文本

从：

["Earth","Mars","Mercury"]

自：

Earth
Mars
Mercury

通过 CLI 上的--raw-output参数与.[]过滤器相结合来实现。例如，

jq --raw-output '.[]' input.json

保存到变量

在处理其余库存时，您需要保留对.profile的引用。下面是一个人为的例子：

从：

[ {"x": "Earth", "y": ["1", "2", "3"]}
, {"x": "Mars", "y": ["1", "2", "3"]}
]

自：

"Earth1"
"Earth2"
"Earth3"
"Mars1"
"Mars2"
"Mars3"

使用此过滤器实现：.[] | .x as $x | .y[] | $x + .

(您可以将.x保存到可以在过滤器中引用的 var$x中。

回答您的问题，鉴于以下input.json：

{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}

以下对jq的调用应该可以解决问题：

jq --raw-output '.inventory[] | .profile as $p | .invState[] | "($p),(.count),(.Status)"' input.json

Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED

如果你没有 jq 或 gawks json 扩展(需要 gawkextlib)，并且你的输入总是像你的例子一样简单和规则，那么这将使用 GNU awk 为第三个 arg 匹配 () 和 gensub() 做你想要的：

$ cat tst.awk
BEGIN { OFS="," }
match($0,/"([^"]+)": *("[^"]*"|[0-9]+)/,a) {
tag = a[1]
val = gensub(/^"|"$/,"","g",a[2])
f[tag] = val
if ( tag == "Status" ) {
print f["profile"], f["count"], f["Status"]
}
}

$ awk -f tst.awk file
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED

这可能对你有用(GNU sed)：

sed -nE '/profile/{s/.*"(S+)".*/1/;h};
/count/{s/.* (S+),.*/1/;H};
/Status/{s/.*"(S+)".*/1/;H;g;s/n/,/gp;g;s/n.*n.*//;h}' file

将profile、count和Status信息塞入保留空间，然后在这样做后Status，检索保留空间，用逗号替换换行符，打印然后删除count和Status详细信息，为下一次做好准备。

注：注：由于这是 json，因此最好使用jq因为这始终是一个更强大的解决方案。

awk -F: 'BEGIN{ OFS=""; p=c=s=""; }
/"profile"/{ p=$2 }
/"count"/{ c=$2 }
/"Status"/{ s=$2 }
{ if(s!="") { print p,c,s; s="" }}'  file.json

输出：

"Earth", 6, "ONLINE"
"Earth", 8, "EXIST"
"Earth", 1, "GIVEN"
"Earth", 4, "ERROR"
"Earth", 49, "INSTOCK"
"Earth", 389, "RELEASED"
"Earth", 68, "DELETED"
"Earth", 280, "CONNECTED"
"Earth", 1, "UNINSTOCK"
"Mars", 7, "EXIST"
"Mars", 20, "INSTOCK"
"Mars", 110, "RELEASED"
"Mars", 16, "DELETED"
"Mars", 41, "CONNECTED"
"Mars", 1, "UNINSTOCK"
"Mercury", 4, "EXIST"
"Mercury", 1224, "INSTOCK"
"Mercury", 3, "CONNECTED"
"Mercury", 18, "RELEASED"
"Mercury", 5, "DELETED"

正是csv，为什么文本字段被双引号包围。

如果你的 json 不在"pretty_print"中，你可能必须执行以下操作：

cat file.json | json_pp | awk .....

较短的版本

脱离阵列

从字符串到文本

保存到变量

相关内容

最新更新

热门标签：