jq通过在输入嵌套数组中查找输出值来转换JSON结构



首先,我对标题感到抱歉。尽管英语不是我的第一语言,但我甚至不知道如何用母语来称呼我正在努力实现的目标。

我想做的是接受一个输入(通过下载带有curl的页面自动生成,然后使用pup以一种非常粗糙的方式从HTML转换为JSON(,并将其转换为以后更容易使用的东西

[
{
"children": [
{
"class": "label label-info",
"tag": "span",
"text": "Lesson"
},
{
"tag": "h2",
"text": "Is That So?"
},
{
"tag": "p",
"text": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか."
},
{
"class": "btn btn-primary",
"href": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so",
"tag": "a",
"text": "Read Lesson"
}
],
"class": "row col-sm-12",
"tag": "div"
},
{
"children": [
{
"class": "label label-warning",
"tag": "span",
"text": "Drills"
},
{
"tag": "h2",
"text": "Yes, That Is So."
},
{
"tag": "p",
"text": "Practice the phrases and vocab from the lesson, Is That So?"
}
],
"class": "row col-sm-12",
"tag": "div"
}
]

我想要的输出将从每个对象的children数组中提取各种值,如下所示:

[
{
"title": "Is That So?", // <-- in other words, find "tag" == "h2" and output "text" value
"perex": "Learn how to provide shortened answers with そうです and stay in the conversation with そうですか.", // "tag" == "p", "text" value
"type": "lesson", // "tag" == "span", "text" value (lowercased if possible? Not needed though)
"link": "https://www.nihongomaster.com/japanese/lessons/view/62/is-that-so" // "tag" == "a", "href" value
},
{
"title": "Yes, That Is So."
"perex": "Practice the phrases and vocab from the lesson, Is That So?",
"type": "drills",
"link": null // Can be missing!
}
]

我尝试了select函数的各种实验,但都没有得到任何可用的结果,所以我不确定我的尝试是否值得分享。

以下是原始问题的简单解决方案:

[
.[]
| .children
| { title: [.[] | select(.tag == "h2") | .text][0],
perex: [.[] | select(.tag == "p") | .text][0],
type:  [.[] | select(.tag == "span") | .text | ascii_downcase][0],
link:  [.[] | select(.tag == "a") | .href][0] }
]

这里的关键点是使用成语[...][0]来处理关于...中的项目数(包括0(的所有可能性。

在写上述问题的过程中,我偶然发现了正确的答案。与其把这些知识留给自己,我想我也应该在这里分享答案。请随时删除整个问题&如果这不符合网站规则,请回答(如果是这样的话,我很抱歉(。

select确实是关键,但在写这道题的时候,我没有以正确的方式使用它。以下是完整的jq命令来满足我的需求,展示了以上所有要求:

  • 如何在搜索children数组的基础上选择嵌套值
  • 如何将type值小写
  • 如何处理有时缺失的CCD_ 11值
  • (当时我没有意识到,但有时我想更改link的形式,所以我也添加了它(
def format(link): if link | tostring | startswith("/") then "https://www.nihongomaster.com" + link else link end;
[.[] | { title: .children[] | select(.tag == "h2").text, type: .children[] | select(.tag == "span").text | ascii_downcase, perex: .children[] | select(.tag == "p").text, link: format(((.children[] | select(.tag == "a").href) // null)) }]

没有什么比橡皮鸭调试更好的了。

最新更新