多个MapReduce函数或聚合框架用于Mongodb中的唯一值和计数



我对MongoDB中的mapReduce和聚合有点陌生。

以下是数据集的示例:

{ "_id" : ObjectId("521002161e0787522098d110"), "userId" : 4545454, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002481e0787522098d111"), "userId" : 64545454, "pickId" : 1, "answerArray" : [  "no" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002871e0787522098d112"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Albany", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 2, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Wichita", "state" : "Kansas" }

我想得到state、city、pickId、answerArray的唯一值列表,然后计算这些唯一的组合。结果需要看起来像这样:

{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["yes"], "count":2}
{"pickId": 1, "city": "Albany", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "Wichita", "state": "Kansas", "answerArray": ["yes"], "count":1}

我遇到的问题是mapReduce只接受两个参数:

Error: fast_emit takes 2 args near...

但我希望将多个唯一值映射到一个pickId上。

以下是我正在查看的mapReduce中的代码:

var mapFunct = function() {
if(this.answerArray == "yes"){
emit(this.pickId,1);}
else{
emit(this.pickId,0);};}
var mapReduce2 = function(keyPickId,answerVals){ 
return Array.sum(answerVals);};
db.answers.mapReduce( mapFunct, mapReduce2, { out: "mapReduceAnswers"})

如有任何帮助或进一步建议,我们将不胜感激。我也研究过聚合框架,但似乎我不会得到我需要的那种输出。

我认为您可以使用聚合获得所需的格式,特别是$group$project运算符。看看这个聚合调用:

var agg_output = db.answers.aggregate([
  { $group: { _id: {
                city: "$city",
                state: "$state",
                answerArray: "$answerArray",
                pickId: "$pickId"
            }, count: { $sum: 1 }}
  },
  { $project: { city: "$_id.city", 
                state: "$_id.state", 
                answerArray: "$_id.answerArray", 
                pickId: "$_id.pickId", 
                count: "$count", 
                _id: 0}
  }
]);
db.answer_counts.insert(agg_output.result);

$group阶段负责将city/state/answerArray/pickId的每个唯一组合的出现次数相加,而$project阶段则将数据放入所需的形式中。

insert调用将生成的输出保存到新的集合中。这有道理吗?

相关内容

  • 没有找到相关文章

最新更新