将json数组折叠为具有组合键值的单个数组



我正试图将多个JSON数组组合成一个类似折叠键值的数组。以下是我得到的代码。

[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobId": 35826845,
"earlyAccessStatus": "NOT_STARTED"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826846,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "zh_CN",
"jobId": 35826836,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826837,
"earlyAccessStatus": "IN_PROGRESS"
}
]
}
]

所需的输出为:(编辑如下以采用不同的路线(

[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobIds": [35826845,35826836],
"earlyAccessStatus": ["NOT_STARTED","IN_PROGRESS"]
},
{
"targetLanguage": "pt_BR",
"jobIds": [35826846,35826837],
"earlyAccessStatus": ["IN_PROGRESS"]
}
] 
}
]

老实说,我不确定重复的.earlyAccessStatus键值会发生什么,是["IN_PROGRESS","IN_PROGRESS"]还是["IN_PROGRESS"]。后者适合我的最终需求,即为任何只有.earlyAccessStatus["IN_PROGRESS"]的语言获取.targetLanguage值。

使用这个带有jq -r选项的过滤器,我可以为任何只有.earlyAccessStatus["IN_PROGRESS"]的语言提取.targetLanguage值。如有任何帮助,我们将不胜感激!

.[].jobs[] | select(.[] | .earlyAccessStatus == ["IN_PROGRESS"] ) | .[] | .targetLanguage

更新:以下是在我操作任何内容之前,原始JSON的样子。我不喜欢任何具体的方法。我试图做的是获取关于具有各种作业的项目的数据,并隔离所有相关作业都已达到特定步骤的语言(目标文件导出1(,如"IN_PROGRESS"所示。如果该语言在不同的步骤中有一个作业,例如下面的zh_CN,那么该语言还没有准备好,不应该通过过滤器。

为了得到上面的输出,我已经做到了(进行了一次更新,以消除已经建议的"作业"中的冗余阵列,我提前为你在jq命令中看到的任何弗兰肯斯坦尝试行分离道歉(:

. | [{"projectId": .projectId, 
"jobs": [( .jobs[] | 
{ "targetLanguage": .targetLanguage, 
"jobId": .jobId,
"earlyAccessStatus": (.steps[] | select(.workflowStepName == "Target file export1") | .status) } )] }]

原始JSON:

{
"projectId": 35902499,
"completionStatus": "IN_PROGRESS",
"activity": "ACTIVE",
"sourceLanguage": "en_US",
"jobs": [
{
"jobId": 35902526,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 35902516,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433561,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433560,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433552,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "pt_BR",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "FINISHED"
},
{
"workflowStepName": "Segment greenification1",
"status": "FINISHED",
"autoStatus": "SUCCESS"
},
{
"workflowStepName": "Target file export1",
"status": "IN_PROGRESS"
}
]
},
{
"jobId": 36433551,
"completionStatus": "IN_PROGRESS",
"targetLanguage": "zh_CN",
"steps": [
{
"workflowStepName": "Project Intake and Quote Generation1",
"status": "FINISHED"
},
{
"workflowStepName": "Translate1",
"status": "FINISHED"
},
{
"workflowStepName": "Correct1",
"status": "IN_PROGRESS"
},
{
"workflowStepName": "Segment greenification1",
"status": "NOT_STARTED"
},
{
"workflowStepName": "Target file export1",
"status": "NOT_STARTED"
}
]
}
]
}

新方法

传奇故事还在继续。我走另一条路走得很近。

.[].jobs | 
map({targetLanguage: .targetLanguage, 
earlyAccess: {
jobId: .jobId, 
earlyAccessStatus: .earlyAccessStatus}}) 
| group_by(.targetLanguage) 
| map({targetLanguage: .[0].targetLanguage, 
jobId: map(.jobId) | unique, 
earlyAccessStatus: map(.earlyAccessStatus) | unique})

这基本上给了我想要的输出,除了我需要的丢失数据,我想这是一个挫折:

[
{
"targetLanguage": "pt_BR",
"jobId": [
null
],
"earlyAccessStatus": [
null
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
null
],
"earlyAccessStatus": [
null
]
}
]

理想情况下,这将输出jobId和earlyAccessStatus键中包含的唯一值,如下所示:

[
{
"targetLanguage": "pt_BR",
"jobId": [
35826846, 35826837
],
"earlyAccessStatus": [
"IN_PROGRESS"
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
35826845, 35826836
],
"earlyAccessStatus": [
"IN_PROGRESS", "NOT_STARTED"
]
}
]

这将使我能够容易地根据earlyAccessStatus == ["IN_PROGRESS"]来筛选targetLanguage

您可能需要重新思考数组中的所有数组,但以下内容实现了您想要的,除了组和密钥的排序:

# meld all the objects in the given stream of objects
def meld(objects): 
reduce objects as $o ({}; 
reduce ($o|keys[]) as $k (.; .[$k] += [$o[$k]]));

.[0].jobs |=
(group_by(.[0].targetLanguage)
| map( [meld(.[] | .[0] ) | .targetLanguage |= .[0] ] ))

这将产生如下所示的输出。删除重复项的最简单方法是添加unique,例如作为meld函数的最后一步。

类似地,如果您希望以不同的顺序排列组,可以将顺序指定为后处理步骤,或者编写自己的group_by

输出

[
{
"projectId": 35525710,
"jobs": [
[
{
"earlyAccessStatus": [
"IN_PROGRESS",
"IN_PROGRESS"
],
"jobId": [
35826846,
35826837
],
"targetLanguage": "pt_BR"
}
],
[
{
"earlyAccessStatus": [
"NOT_STARTED",
"IN_PROGRESS"
],
"jobId": [
35826845,
35826836
],
"targetLanguage": "zh_CN"
}
]
]
}
]

所有这些1元素长的数组都很痛苦,我相信有更好的方法可以做到这一点,但以下方法有效:

jq '[.[]
| .jobs |=
[flatten
| group_by(.targetLanguage)
| .[]
| [reduce .[] as $job ({jobId:[], earlyAccessStatus:[]};
{targetLanguage: $job.targetLanguage,
jobId: (.jobId + [$job.jobId]),
earlyAccessStatus: (.earlyAccessStatus + [$job.earlyAccessStatus])})]
| walk(if type == "object" and has("earlyAccessStatus")
then .earlyAccessStatus |= unique
else . end) ] ]' input.json
[
{
"projectId": 35525710,
"jobs": [
[
{
"targetLanguage": "pt_BR",
"jobId": [
35826846,
35826837
],
"earlyAccessStatus": [
"IN_PROGRESS"
]
}
],
[
{
"targetLanguage": "zh_CN",
"jobId": [
35826845,
35826836
],
"earlyAccessStatus": [
"IN_PROGRESS",
"NOT_STARTED"
]
}
]
]
}
]

想出了一些办法!走的更多的是集体的路线和独特。

. | [{projectId: .projectId, 
jobs: [( .jobs[] | { 
targetLanguage: .targetLanguage, 
jobId: .jobId, 
earlyAccessStatus: (.steps[] | select(.workflowStepName == "Target file export1") | .status), 
finalStatus: (.steps[] | select(.workflowStepName == "Target file export2") | .status) } )] }] | 
.[].jobs | group_by(.targetLanguage) | map({
targetLanguage: .[0].targetLanguage, 
jobId: map(.jobId),
earlyAccessStatus: map(.earlyAccessStatus) | unique, 
finalStatus: map(.finalStatus) | unique})

这种方法分为两个步骤,首先提取相关数据并像这样组织:

[
{
"projectId": 35525710,
"jobs": [
{
"targetLanguage": "zh_CN",
"jobId": 35826845,
"earlyAccessStatus": "NOT_STARTED"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826846,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "zh_CN",
"jobId": 35826836,
"earlyAccessStatus": "IN_PROGRESS"
},
{
"targetLanguage": "pt_BR",
"jobId": 35826837,
"earlyAccessStatus": "IN_PROGRESS"
}
]
}
]

一旦以这种方式简化了内容,就按.targetLanguage对其进行分组,然后将.jobId.earlyAccessStatus.finalStatus密钥映射到每个.targetLanguage密钥-值对下的它们的唯一值。我相信有一种更简单的方法来格式化它,但它完全实现了我设定的目标,将数据保留为以下格式:

[
{
"targetLanguage": "pt_BR",
"jobId": [
35902526,
36433561,
36433552
],
"earlyAccessStatus": [
"IN_PROGRESS"
],
"finalStatus": [
"IN_PROGRESS"
]
},
{
"targetLanguage": "zh_CN",
"jobId": [
35902516,
36433560,
36433551
],
"earlyAccessStatus": [
"IN_PROGRESS",
"NOT_STARTED"
],
"finalStatus": [
"IN_PROGRESS"
]
}
]

最新更新