我对MongoDB中的mapReduce和聚合有点陌生。
以下是数据集的示例:
{ "_id" : ObjectId("521002161e0787522098d110"), "userId" : 4545454, "pickId" : 1, "answerArray" : [ "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002481e0787522098d111"), "userId" : 64545454, "pickId" : 1, "answerArray" : [ "no" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002871e0787522098d112"), "userId" : 78263636, "pickId" : 1, "answerArray" : [ "yes" ], "city" : "Albany", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 2, "answerArray" : [ "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 1, "answerArray" : [ "yes" ], "city" : "Wichita", "state" : "Kansas" }
我想得到state、city、pickId、answerArray的唯一值列表,然后计算这些唯一的组合。结果需要看起来像这样:
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["yes"], "count":2}
{"pickId": 1, "city": "Albany", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "Wichita", "state": "Kansas", "answerArray": ["yes"], "count":1}
我遇到的问题是mapReduce只接受两个参数:
Error: fast_emit takes 2 args near...
但我希望将多个唯一值映射到一个pickId上。
以下是我正在查看的mapReduce中的代码:
var mapFunct = function() {
if(this.answerArray == "yes"){
emit(this.pickId,1);}
else{
emit(this.pickId,0);};}
var mapReduce2 = function(keyPickId,answerVals){
return Array.sum(answerVals);};
db.answers.mapReduce( mapFunct, mapReduce2, { out: "mapReduceAnswers"})
如有任何帮助或进一步建议,我们将不胜感激。我也研究过聚合框架,但似乎我不会得到我需要的那种输出。
我认为您可以使用聚合获得所需的格式,特别是$group
和$project
运算符。看看这个聚合调用:
var agg_output = db.answers.aggregate([
{ $group: { _id: {
city: "$city",
state: "$state",
answerArray: "$answerArray",
pickId: "$pickId"
}, count: { $sum: 1 }}
},
{ $project: { city: "$_id.city",
state: "$_id.state",
answerArray: "$_id.answerArray",
pickId: "$_id.pickId",
count: "$count",
_id: 0}
}
]);
db.answer_counts.insert(agg_output.result);
$group
阶段负责将city/state/answerArray/pickId的每个唯一组合的出现次数相加,而$project
阶段则将数据放入所需的形式中。
insert
调用将生成的输出保存到新的集合中。这有道理吗?