按重复值分隔记录



我有包含对象数组的文档。在该阵列中是数据集中的脉冲。例如:

samples: [{"time":1224960,"flow":0,"temp":null},{"time":1224970,"flow":0,"temp":null}, 
{"time":1224980,"flow":23,"temp":null},{"time":1224990,"flow":44,"temp":null}, 
{"time":1225000,"flow":66,"temp":null},{"time":1225010,"flow":0,"temp":null},
{"time":1225020,"flow":650,"temp":null},{"time":1225030,"flow":40,"temp":null}, 
{"time":1225040,"flow":60,"temp":null},{"time":1225050,"flow":0,"temp":null},
{"time":1225060,"flow":0,"temp":null},{"time":1225070,"flow":0,"temp":null},
{"time":1225080,"flow":0,"temp":null},{"time":1225090,"flow":0,"temp":null},
{"time":1225100,"flow":0,"temp":null},{"time":1225110,"flow":67,"temp":null},
{"time":1225120,"flow":23,"temp":null},{"time":1225130,"flow":0,"temp":null},
{"time":1225140,"flow":0,"temp":null},{"time":1225150,"flow":0,"temp":null}]

我想构建一个聚合管道,对零以上的连续"samples.flow"值的每个集合进行操作。与中一样,采样脉冲由一个或多个零流量值界定。我可以使用$rellow阶段来压平数据,但我不知道如何随后对每个脉冲进行分组。我不反对这是一个多步骤的过程。但我宁愿不必在客户端的代码中循环使用它。这些数据将包括来自多个文档的字段,总计可能有数十万个条目。

从上面的例子中,我希望能够提取:

[{"time":1224980,"total_flow":123,"temp":null},
{"time":1225020,"total_flow":750,"temp":null}, 
{"time":1225110,"total_flow":90,"temp":null}]

或其变体。

如果您不想在time字段中查找特定值,那么您可以将此管道与$bucketAuto一起使用。

[
{
"$bucketAuto": {
"groupBy": "$time",
"buckets": 3,
"output": {
total_flow: {
$sum: "$flow"
},
temp: {
$first: "$temp"
},
time: {
"$min": "$time"
}
}
}
},
{
"$project": {
"_id": 0
}
}
]

如果您正在为time寻找一些特定的值,则需要使用$bucket并为其提供一个具有预先计算的下界的边界参数。我认为这个解决方案应该做你的工作

最新更新