如何使用 jq 将混合的 JSON 转换为 CSV



无法弄清楚这个...我想折叠这个 JSON,以便只有 NAME 字段和每条记录的数据类的值输出到 CSV 文件。 这让我的 jq-fu 负担沉重,因为它必须将字符串与嵌套数组混合并以某种方式将其放入 CSV 中。

[
{
"Name": "Evite",
"Title": "Evite",
"Domain": "evite.com",
"BreachDate": "2013-08-11",
"AddedDate": "2019-07-14T14:51:51Z",
"ModifiedDate": "2019-07-14T14:51:51Z",
"PwnCount": 100985047,
"Description": "In April 2019, the social planning website for managing online invitations <a href="https://www.evite.com/security/update?usource=lc&lctid=1800182" target="_blank" rel="noopener">Evite identified a data breach of their systems</a>. Upon investigation, they found unauthorised access to a database archive dating back to 2013. The exposed data included a total of 101 million unique email addresses, most belonging to recipients of invitations. Members of the service also had names, phone numbers, physical addresses, dates of birth, genders and passwords stored in plain text exposed. The data was provided to HIBP by a source who requested it be attributed to &quot;JimScott.Sec@protonmail.com&quot;.",
"LogoPath": "https://haveibeenpwned.com/Content/Images/PwnedLogos/Evite.png",
"DataClasses": [
"Dates of birth",
"Email addresses",
"Genders",
"Names",
"Passwords",
"Phone numbers",
"Physical addresses"
],
"IsVerified": true,
"IsFabricated": false,
"IsSensitive": false,
"IsRetired": false,
"IsSpamList": false
},
{
"Name": "VerificationsIO",
"Title": "Verifications.io",
"Domain": "verifications.io",
"BreachDate": "2019-02-25",
"AddedDate": "2019-03-09T19:29:54Z",
"ModifiedDate": "2019-03-09T20:49:51Z",
"PwnCount": 763117241,
"Description": "In February 2019, the email address validation service <a href="https://securitydiscovery.com/800-million-emails-leaked-online-by-email-verification-service" target="_blank" rel="noopener">verifications.io suffered a data breach</a>. Discovered by <a href="https://twitter.com/mayhemdayone" target="_blank" rel="noopener">Bob Diachenko</a> and <a href="https://twitter.com/vinnytroia" target="_blank" rel="noopener">Vinny Troia</a>, the breach was due to the data being stored in a MongoDB instance left publicly facing without a password and resulted in 763 million unique email addresses being exposed. Many records within the data also included additional personal attributes such as names, phone numbers, IP addresses, dates of birth and genders. No passwords were included in the data. The Verifications.io website went offline during the disclosure process, although <a href="https://web.archive.org/web/20190227230352/https://verifications.io/" target="_blank" rel="noopener">an archived copy remains viewable</a>.",
"LogoPath": "https://haveibeenpwned.com/Content/Images/PwnedLogos/VerificationsIO.png",
"DataClasses": [
"Dates of birth",
"Email addresses",
"Employers",
"Genders",
"Geographic locations",
"IP addresses",
"Job titles",
"Names",
"Phone numbers",
"Physical addresses"
],
"IsVerified": true,
"IsFabricated": false,
"IsSensitive": false,
"IsRetired": false,
"IsSpamList": false
}
]

我只能用这个jq管道来展平数据类:

jq -r  '.[] | {DataClasses} | .[] | @csv'

但是它缺少名称字符串,我一直无法弄清楚如何包含它。

上下文是从 haveibeenpwned.com(从中返回 JSON(生成报告,该报告将包含来自 DataClass 的违规名称和有关违规性质的详细信息,以便可以很好地汇总它并通过电子邮件将通知发送给受影响的各方。编写脚本很容易 - 我们有API密钥等,但无法弄清楚如何让JSON合作;)

如果这种将所有内容全部保存在CSV中的基本方法(可以在perl或python中轻松修改(,那么有关如何通过jq重整最好地优雅地处理它的任何建议都将使我们能够轻松引用违规行为的名称及其关联的DATACLASSES数组中的项目将是最有帮助

的任何建议将不胜感激。

谢谢

使用示例 JSON 进行调用

jq -r '.[] | [.Name, .DataClasses[]] | @csv'

生产:

"Evite","Dates of birth","Email addresses","Genders","Names","Passwords","Phone numbers","Physical addresses"
"VerificationsIO","Dates of birth","Email addresses","Employers","Genders","Geographic locations","IP addresses","Job titles","Names","Phone numbers","Physical addresses"

最新更新