MongoDb:获取复杂数据的聚合$avg



我正在尝试在我的 Mongo 聚合中获得平均评级,但在访问嵌套数组时遇到问题。我已经让我的聚合给出以下数组。我试图city_reviews返回一个平均值数组。

[ 
{
"_id": "Dallas",
"city_reviews": [
//arrays of restaurant objects that include the rating
//I would like to get an average of the rating in each review, so these arrays will be numbers (averages)
[ {
"_id": "5b7ead6d106f0553d8807276",
"created": "2018-08-23T12:41:29.791Z",
"text": "Crackin good place. ",
"rating": 4,
"store": "5b7d67d5356114089909e58d",
"author": "5b7d675e356114089909e58b",
"__v": 0
}, {review2}, {review3}]
[{review1}, {review2}, {review3}],
[{review1}. {review2}],
[{review1}, {review2}, {review3}, {review4}],
[]
]
},
{
"_id": "Houston",
"city_reviews": [
// arrays of restaurants 
[{review1}, {review2}, {review3}],
[{review1}, {review2}, {review3}],
[{review1}, {review2}, {review3}, {review4}],
[],
[]
]
}
]

我想对此进行聚合,返回city_reviews内的平均值数组,如下所示:

{
"_id": "Dallas",
"city_reviews": [
// arrays of rating averages
[4.7],
[4.3],
[3.4],
[],
[]
]
}

这是我尝试过的。它给了我 averageRating 为 null,因为 $city_reviews 是一个对象数组,我没有告诉它足够深入以捕获评级键。

return this.aggregate([
{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 
'reviews' }},
{$group: {_id: '$city', city_reviews: { $push : '$reviews'}}},
{ $project: {
averageRating: { $avg: '$city_reviews'}
}}
])

有没有办法使用这条线,这样我就可以返回平均值数组而不是完整的审查对象。

averageRating: { $avg: '$city_reviews'}

编辑:被要求整个管道。

return this.aggregate([
{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},
{$group: {
_id: '$city', 
city_reviews: { $push : '$reviews'}}
},
{ $project: {
photo: '$$ROOT.photo',
name: '$$ROOT.name',
reviews: '$$ROOT.reviews',
slug: '$$ROOT.slug',
city: '$$ROOT.city',
"averageRatingIndex":{
"$map":{
"input":"$city_reviews",
"in":[{"$avg":"$$this.rating"}]
}
},
}
},
{ $sort: { averageRating: -1 }},
{ $limit: 5 }
])

我的第一个查询是将两个模型连接在一起:

{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},

这导致了这个:

[ {
"_id": "5b7d67d5356114089909e58d",
"location": {},
"tags": [],
"created": "2018-08-22T13:23:23.224Z",
"name": "Lucia",
"description": "Great name",
"city": "Dallas",
"photo": "ab64b3e7-6207-41d8-a670-94315e4b23af.jpeg",
"author": "5b7d675e356114089909e58b",
"slug": "lucia",
"__v": 0,
"reviews": []
},
{..more object like above}
]

然后,我像这样对它们进行分组:

{$group: {
_id: '$city', 
city_reviews: { $push : '$reviews'}}
}

这回到了我最初的问题。从本质上讲,我只想对每个城市有一个总的平均评分。我接受的答案确实回答了我原来的问题。我正在得到这个:

{
"_id": "Dallas",
"averageRatingIndex": [
[ 4.2 ],
[ 3.6666666666666665 ],
[ null ],
[ 3.2 ],
[ 5 ],
[ null ]
]
}

我尝试在此上使用$avg运算符来返回一个可以为每个城市显示的最终平均值,但是我遇到了麻烦。

您可以使用$mapto 与$avg来输出 avg。

{"$project":{
"averageRating":{
"$map":{
"input":"$city_reviews",
"in":[{"$avg":"$$this.rating"}]
}
}
}}

关于您的优化请求,我认为除了您已有的版本之外,没有太大的改进空间。但是,以下管道可能比当前解决方案更快,因为初始$group阶段应该导致更少的$lookup。我不确定MongoDB将如何在内部优化所有这些,因此您可能希望根据真实数据集分析这两个版本。

db.getCollection('something').aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
let: { "averageRating": "$averageRating" }, // create a variable called "$$ids" which will hold the previously created array of "_id"s
pipeline: [{
$match: { $expr: { $in: [ "$store", "$$averageRating" ] } } // do the usual "joining"
}, {
$group: {
"_id": null, // group all found items into the same single bucket
"rating": { $avg: "$rating" }, // calculate the avg on a per "store" basis
}
}],
as: 'averageRating' 
}
}, {
$sort: { "averageRating.rating": -1 }
}, {
$limit: 5
}, { 
$addFields: { // beautification of the output only, technically not needed - we do this as the last stage in order to only do it for the max. of 5 documents that we're interested in
"averageRating": { // this is where we reuse the field we created in the first stage
$arrayElemAt: [ "$averageRating.rating", 0 ] // pull the first element inside the array outside of the array
}
}
}])

事实上,"初始$group阶段"方法也可以与@Veerams这样的解决方案结合使用:

db.collection.aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
localField: 'averageRating',
foreignField: 'store',
as: 'averageRating'
},
}, {
$project: {
"averageRating": {
$avg: {
$map: {
input: "$averageRating",
in: { $avg: "$$this.rating" }
}
}
}
}
}, {
$sort: { averageRating: -1 }
}, {
$limit: 5
}])

最新更新