提取深嵌入json的子集,只打印关键字,值对我感兴趣的json子集



我有一个深嵌入json文件:我只想提取和解析我感兴趣的子集,在我的情况下,所有内容在'node'键。如何:

  1. 提取包含"edges[].node"的json文件的子集(边是节点的'父'键)

  2. 'node'会话,我感兴趣的键:值对

    .url,
    .headline.default, (*this one is 'grandchild' of key 'node'*)
    .firstPublished
    

    我想保持在'node'键中仅高于3项我怎样才能打印出我需要的json文件的超薄版本?

  3. 一个更好的选择是:我仍然可以保留结构/完整路径这导致json根键嵌入'节点' json我感兴趣的子集?

这是jqplay-myjson (我的json文件的完整内容)

试着在这里附上我的全部内容:

{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z",
"headline": {
"default": "I.C.C. Joins Investigation of War Crimes in Ukraine",
"__typename": "CreativeWorkHeadline"
},
"summary": "Karim Khan, the chief prosecutor of the International Criminal Court, said that his organization would participate in a joint effort — with Ukraine, Poland and Lithuania — to investigate war crimes committed since Russia’s invasion.",
"promotionalMedia": {
"__typename": "Image",
"id": "SW1hZ2U6bnl0Oi8vaW1hZ2UvYTY3MTVhNDUtZDE0NS01OWZjLThkZWItNzYxMWViN2UyODhk"
},
"embedded": false
},
"__typename": "AssetsEdge"
},
{
"node": {
"__typename": "Article",
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z",
"typeOfMaterials": [
"News"
],
"archiveProperties": {
"lede": "",
"__typename": "ArticleArchiveProperties"
},
"headline": {
"default": "Endgame Nears in Bidding for Chelsea F.C.",
"__typename": "CreativeWorkHeadline"
},
"summary": "The American bank selling the English soccer team on behalf of its Russian owner could name its preferred suitor by the end of the week. But the drama isn’t over.",
"translations": []
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}

下面是我的命令jqplay Demo:

.data.legacyCollection.collectionsPage.stream.edges[].node|= with_entries(select([.key]|inside(["default","url","firstPublished"]))

这是我得到的输出

{
"data": {
"legacyCollection": {
"longDescription": "The latest news, analysis and investigations from Europe.",
"section": {
"name": "world",
"url": "/section/world"
},
"collectionsPage": {
"stream": {
"pageInfo": {
"hasNextPage": true,
"__typename": "PageInfo"
},
"__typename": "AssetsConnection",
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
},
"__typename": "AssetsEdge"
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
},
"__typename": "AssetsEdge"
}
],
"totalCount": 52559
}
},
"sourceId": "100000004047788",
"tagline": "",
"__typename": "LegacyCollection"
}
}
}

下面是我期望的输出

{
"data": {
"legacyCollection": {
"collectionsPage": {
"stream": {
"edges": [
{
"node": {
"url": "https://www.nytimes.com/video/world/europe/100000008323381/icc-war-crimes-ukraine.html",
"firstPublished": "2022-04-27T23:28:33.241Z"
}
},
{
"node": {
"url": "https://www.nytimes.com/2022/04/27/sports/soccer/chelsea-sale-roman-abramovich.html",
"firstPublished": "2022-04-27T19:42:17.000Z"
}
}
]
}
}
}
}
}

这是一个(有点)声明性的解决方案:

(.data.legacyCollection.collectionsPage.stream.edges
| map( {node: (.node
| {url,
firstPublished,
headline: {default: .headline.default} })})) as $edges
| {data: {
legacyCollection: {
collectionsPage: {
stream: {
$edges
}
}
}
}
}

这里有一种方法可以在确保保留结构的同时进行选择。这个解决方案可能很有趣,因为它可以很容易地与jq的"——stream"一起使用。选择。

def array_startswith($head): .[: $head|length] == $head;
. as $in
| ["data", "legacyCollection", "collectionsPage", "stream", "edges"] as $head
| ($head|length) as $len
| reduce (paths
| select( array_startswith($head) and .[1+$len] == "node" )) as $p
(null;
if ((($p|length) == $len + 3) and ($p[-1] | IN("url", "firstPublished")))
or ((($p|length) == $len + 4) and $p[-2:] == ["headline", "default"])
then setpath($p; $in | getpath($p))
else .
end)

最新更新