我需要聚合以下数据
- Country: One, Car: Volvo, Name: Smith, Price: 100
- Country: One, Car: BMW, Name: Smith, Price: 200
- Country: Two, Car: Romeo, Name: Joe, Price: 50
- Country: Two, Car: KIA, Name: Joe, Price: 110
- Country: Two, Car: KIA, Name: Joe, Price: 90
(名字是唯一的,每个人在一个国家拥有汽车)
结果,我预计(不需要复数):
- Name: Smith, Type: Volvos, Country: One, Val: 1 // Count of car-type
- Name: Smith, Type: BMWs, Country: One, Val: 1
- Name: Smith, Type: Total, Country: One, Val: 2 // Count of all his cars
- Name: Smith, Type: Price, Country: One, Val: 300 // Total car price
- Name: Joe, Type: Romeos, Country: Two, Val: 1
- Name: Joe, Type: KIAs, Country: Two, Val: 2
- Name: Joe, Type: Total, Country: Two, Val: 3
- Name: Joe, Type: Price, Country: Two, Val: 250
例如,这是一个构建报表的数据透视版本
Country | Name | Volvos | BMWs | Romeos | KIAs | Total | Price
----------------------------------------------------------------
One | Smith | 1 | 1 | | | 2 | 300
----------------------------------------------------------------
Two | Joe | | | 1 | 2 | 3 | 250
| Other | ? | ? | ... etc
我在想,mongo中的聚合框架是否可以处理这个问题,或者我应该使用hardcore map reduce?
不是你规定的结果,但实际上是以一种MongoDB的方式:
db.cars.aggregate([
{ "$group": {
"_id": {
"name": "$Name",
"type": "$Car"
},
"Country": { "$first": "$Country" },
"CarCount": { "$sum": 1 },
"TotalPrice": { "$sum": "$Price" }
}},
{ "$group": {
"_id": "$_id.name",
"cars": {
"$push": {
"type": "$_id.type",
"country": "$Country",
"carCount": "$CarCount",
"TotalPrice": "$TotalPrice"
}
},
"TotalPrice": { "$sum": "$TotalPrice" }
}}
])
这给了你:
{
"_id" : "Smith",
"cars" : [
{
"type" : "BMW",
"country" : "One",
"carCount" : 1,
"TotalPrice" : 200
},
{
"type" : "Volvo",
"country" : "One",
"carCount" : 1,
"TotalPrice" : 100
}
],
"TotalPrice" : 300
}
{
"_id" : "Joe",
"cars" : [
{
"type" : "KIA",
"country" : "Two",
"carCount" : 2,
"TotalPrice" : 200
},
{
"type" : "Romeo",
"country" : "Two",
"carCount" : 1,
"TotalPrice" : 50
}
],
"TotalPrice" : 250
}
但是,对于数量可变的类型,我不认为您可以在一个聚合查询中获得所有这些,但是,您可以在两个聚合中获得整个表。
我应该提到的是,总数可以在客户端计算,这也应该很快。
我还应该注意到,聚合框架目前无法"合并"两个输出:http://docs.mongodb.org/manual/reference/operator/aggregation/out/但您可以对两个结果进行排序,使其排序相同。
首先你想要你的总数(如果你是通过聚合框架这样做的话):
db.cars.aggregate({
{$group: {
_id: {
Country: '$country',
Name: '$Name'
},
car_count: {$sum: 1},
value_total: {$sum: '$Val'}
}},
{$sort: {_id: 1}} // we now sort by the country and name
})
所以现在你想要你的每辆车的总数:
db.cars.aggregate({
{$group: {
_id: {
Country: '$country',
Name: '$Name',
Type: '$Type'
},
sort_key: { // We add this so we can sort the same as the totals
Country: '$Country',
Name: '$Name'
},
car_count: {$sum: 1},
value_total: {$sum: '$Val'}
}},
{$sort: {sort_key: 1}} // we now sort by the country and name
})
现在,例如在JavaScript中,您可以迭代第一组结果,即您的总数,在嵌套循环中迭代来自其他聚合的详细结果,将其全部打印出来。
这可能比Map Reduce更快,但另一种选择是每隔一段时间使用Map Reduce更新一次聚合集合,然后从中挑选。这意味着结果不会是实时的(可能会延迟5分钟),但速度会非常快。
聚合应该可以实现这一点。最简单的两个单独的命令。。。如果你的收藏被称为汽车,你可以运行这样的东西:
db.cars.aggregate([{$group:{_id:{"Country":"$Country","Name":"$Name"},"sum":{$sum:1},"price":{$sum:"$Price"}}}])
db.cars.aggregate([{$group:{_id:{"Country":"$Country","Name":"$Name","Car":"$Car"},"sum":{$sum:1},"price":{$sum:"$Price"}}}])