如何在复杂的JSON对象中只保留一组指定字段



摘要:

我需要用JQ从一个复杂的JSON对象中过滤PII数据。我不擅长解构主义和/或多遍的脚本编写。我想保留非PII属性而不是删除PII属性,因为如果后端添加了新的PII属性但不通知我,我想避免PII泄漏暴露。

在简单的情况下,我可以很容易地"重构";输入对象中所需的JSON对象,如下所示:

{
"data": {
"id": "123",
"pii": "sensitve"
}
"return-code": 200
}

jq '{data: {id: .data.id }, return-code: .return-code}'

一旦数组被添加到混合中,我就看不出如何使用这种方法来解决这个问题。

复杂对象的简化示例

输入:

{
"customers": [
{
"id": "00000000001",
"dateOfBirth": "sensitive DOB",
"preferences": [
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
]
},
{
"id": "00000000002",
"dateOfBirth": "sensitive DOB",
"preferences": [
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
}
]
}

所需输出:

{
"customers": [
{
"id": "00000000001",
"preferences": [
{
"preference-id": "0001"
},
{
"preference-id": "0002"
}
]
},
{
"id": "00000000002",
"preferences": [
{
"preference-id": "0003"
},
{
"preference-id": "0004"
}
]
}
]
}

尝试的阵列方法:

jq '{ customers: [ { id: .customers[].id, preferences: [ .customers[].preferences ] } ]}'

结果开始汇集不同客户的排列

{
"customers": [
{
"id": "00000000001",
"preferences": [
[
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
],
[
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
]
},
{
"id": "00000000002",
"preferences": [
[
{
"preference-id": "0001",
"pii-value": "senstive value 1"
},
{
"preference-id": "0002",
"pii-value": "senstive value 2"
}
],
[
{
"preference-id": "0003",
"pii-value": "senstive value 3"
},
{
"preference-id": "0004",
"pii-value": "senstive value 4"
}
]
]
}
]
}

我真的不认为这种方法会起作用,我对其他方法也不知所措。这是一个简化的例子,实际的JSON相当大,有许多不同嵌套级别的数组。

对我可能调查的方法有什么建议吗?

使用用户函数从JSON:中选择特定路径

def pick(paths):
. as $in
| reduce path(paths) as $path (null;
setpath($path; $in | getpath($path))
);
pick(.customers[] | .id, .preferences[]."preference-id")

在线演示

最新更新