我正试图将多个JSON数组组合成一个类似折叠键值的数组。以下是我得到的代码。
[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobId": 35826845,
"earlyAccessStatus": "NOT_STARTED"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826846,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "zh_CN",
"jobId": 35826836,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826837,
"earlyAccessStatus": "IN_PROGRESS"
}
]
}
]
所需的输出为:(编辑如下以采用不同的路线(
[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobIds": [35826845,35826836],
"earlyAccessStatus": ["NOT_STARTED","IN_PROGRESS"]
},
{
"targetLanguage": "pt_BR",
"jobIds": [35826846,35826837],
"earlyAccessStatus": ["IN_PROGRESS"]
}
]
}
]
老实说,我不确定重复的.earlyAccessStatus
键值会发生什么,是["IN_PROGRESS","IN_PROGRESS"]
还是["IN_PROGRESS"]
。后者适合我的最终需求,即为任何只有.earlyAccessStatus
是["IN_PROGRESS"]
的语言获取.targetLanguage
值。
使用这个带有jq -r
选项的过滤器,我可以为任何只有.earlyAccessStatus
是["IN_PROGRESS"]
的语言提取.targetLanguage
值。如有任何帮助,我们将不胜感激!
.[].jobs[] | select(.[] | .earlyAccessStatus == ["IN_PROGRESS"] ) | .[] | .targetLanguage
更新:以下是在我操作任何内容之前,原始JSON的样子。我不喜欢任何具体的方法。我试图做的是获取关于具有各种作业的项目的数据,并隔离所有相关作业都已达到特定步骤的语言(目标文件导出1(,如"IN_PROGRESS"所示。如果该语言在不同的步骤中有一个作业,例如下面的zh_CN,那么该语言还没有准备好,不应该通过过滤器。
为了得到上面的输出,我已经做到了(进行了一次更新,以消除已经建议的"作业"中的冗余阵列,我提前为你在jq命令中看到的任何弗兰肯斯坦尝试行分离道歉(:
. | [{"projectId": .projectId,
"jobs": [( .jobs[] |
{ "targetLanguage": .targetLanguage,
"jobId": .jobId,
"earlyAccessStatus": (.steps[] | select(.workflowStepName == "Target file export1") | .status) } )] }]
原始JSON:
{
"projectId": 35902499,
"completionStatus": "IN_PROGRESS",
"activity": "ACTIVE",
"sourceLanguage": "en_US",
"jobs": [
{
"jobId": 35902526,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 35902516,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433561,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433560,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433552,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433551,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "IN_PROGRESS"
},
{
"workflowStepName": "Segment greenification1",
"status": "NOT_STARTED"
},
{
"workflowStepName": "Target file export1",
"status": "NOT_STARTED"
}
]
}
]
}
新方法
传奇故事还在继续。我走另一条路走得很近。
.[].jobs |
map({targetLanguage: .targetLanguage,
earlyAccess: {
jobId: .jobId,
earlyAccessStatus: .earlyAccessStatus}})
| group_by(.targetLanguage)
| map({targetLanguage: .[0].targetLanguage,
jobId: map(.jobId) | unique,
earlyAccessStatus: map(.earlyAccessStatus) | unique})
这基本上给了我想要的输出,除了我需要的丢失数据,我想这是一个挫折:
[
{
"targetLanguage": "pt_BR",
"jobId": [
null
],
"earlyAccessStatus": [
null
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
null
],
"earlyAccessStatus": [
null
]
}
]
理想情况下,这将输出jobId和earlyAccessStatus键中包含的唯一值,如下所示:
[
{
"targetLanguage": "pt_BR",
"jobId": [
35826846, 35826837
],
"earlyAccessStatus": [
"IN_PROGRESS"
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
35826845, 35826836
],
"earlyAccessStatus": [
"IN_PROGRESS", "NOT_STARTED"
]
}
]
这将使我能够容易地根据earlyAccessStatus == ["IN_PROGRESS"]
来筛选targetLanguage
。
您可能需要重新思考数组中的所有数组,但以下内容实现了您想要的,除了组和密钥的排序:
# meld all the objects in the given stream of objects
def meld(objects):
reduce objects as $o ({};
reduce ($o|keys[]) as $k (.; .[$k] += [$o[$k]]));
.[0].jobs |=
(group_by(.[0].targetLanguage)
| map( [meld(.[] | .[0] ) | .targetLanguage |= .[0] ] ))
这将产生如下所示的输出。删除重复项的最简单方法是添加unique
,例如作为meld
函数的最后一步。
类似地,如果您希望以不同的顺序排列组,可以将顺序指定为后处理步骤,或者编写自己的group_by
。
输出
[
{
"projectId": 35525710,
"jobs": [
[
{
"earlyAccessStatus": [
"IN_PROGRESS",
"IN_PROGRESS"
],
"jobId": [
35826846,
35826837
],
"targetLanguage": "pt_BR"
}
],
[
{
"earlyAccessStatus": [
"NOT_STARTED",
"IN_PROGRESS"
],
"jobId": [
35826845,
35826836
],
"targetLanguage": "zh_CN"
}
]
]
}
]
所有这些1元素长的数组都很痛苦,我相信有更好的方法可以做到这一点,但以下方法有效:
jq '[.[]
| .jobs |=
[flatten
| group_by(.targetLanguage)
| .[]
| [reduce .[] as $job ({jobId:[], earlyAccessStatus:[]};
{targetLanguage: $job.targetLanguage,
jobId: (.jobId + [$job.jobId]),
earlyAccessStatus: (.earlyAccessStatus + [$job.earlyAccessStatus])})]
| walk(if type == "object" and has("earlyAccessStatus")
then .earlyAccessStatus |= unique
else . end) ] ]' input.json
[
{
"projectId": 35525710,
"jobs": [
[
{
"targetLanguage": "pt_BR",
"jobId": [
35826846,
35826837
],
"earlyAccessStatus": [
"IN_PROGRESS"
]
}
],
[
{
"targetLanguage": "zh_CN",
"jobId": [
35826845,
35826836
],
"earlyAccessStatus": [
"IN_PROGRESS",
"NOT_STARTED"
]
}
]
]
}
]
想出了一些办法!走的更多的是集体的路线和独特。
. | [{projectId: .projectId,
jobs: [( .jobs[] | {
targetLanguage: .targetLanguage,
jobId: .jobId,
earlyAccessStatus: (.steps[] | select(.workflowStepName == "Target file export1") | .status),
finalStatus: (.steps[] | select(.workflowStepName == "Target file export2") | .status) } )] }] |
.[].jobs | group_by(.targetLanguage) | map({
targetLanguage: .[0].targetLanguage,
jobId: map(.jobId),
earlyAccessStatus: map(.earlyAccessStatus) | unique,
finalStatus: map(.finalStatus) | unique})
这种方法分为两个步骤,首先提取相关数据并像这样组织:
[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobId": 35826845,
"earlyAccessStatus": "NOT_STARTED"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826846,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "zh_CN",
"jobId": 35826836,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826837,
"earlyAccessStatus": "IN_PROGRESS"
}
]
}
]
一旦以这种方式简化了内容,就按.targetLanguage
对其进行分组,然后将.jobId
、.earlyAccessStatus
和.finalStatus
密钥映射到每个.targetLanguage
密钥-值对下的它们的唯一值。我相信有一种更简单的方法来格式化它,但它完全实现了我设定的目标,将数据保留为以下格式:
[
{
"targetLanguage": "pt_BR",
"jobId": [
35902526,
36433561,
36433552
],
"earlyAccessStatus": [
"IN_PROGRESS"
],
"finalStatus": [
"IN_PROGRESS"
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
35902516,
36433560,
36433551
],
"earlyAccessStatus": [
"IN_PROGRESS",
"NOT_STARTED"
],
"finalStatus": [
"IN_PROGRESS"
]
}
]