在MapReduce MongoDB中按密钥对数据进行分组



我正在MongoDB中尝试MapReduce程序来寻找共同的朋友,我在MongoDB 中对密钥进行排序后获得了以下数据

{"user" : " Hari","friend" : "Shiva",
 "friendList": ["Hanks"," Tom"," Karma"," Hari"," Dinesh"]}

 {"user" : "Hari","friend" : " Shiva",
  "friendList" : ["Karma"," Tom"," Ram"," Bindu"," Shiva",
                   " Kishna"," Bikash"," Bakshi"," Dinesh"]}

现在,我想将这些具有相同键的数据集分组为一组,在将键值对发送到reducers之前,使用map函数中的Javascript,如何对数据进行分组?例如,我想要像一样的输出

{"user" : " Hari","friend" : "Shiva",
 "friendList": ["Hanks"," Tom"," Karma"," Hari"," Dinesh"],["Karma"," Tom"," Ram"," Bindu"," Shiva"," Kishna"," Bikash"," Bakshi"," Dinesh"]}

您可以将两条记录的friendlist数组连接到一个数组中,以创建如下对象:

   {
  "_id": {
    "user": " Hari",
    "friend": "Shiva"
  },
  "value": {
    "friendList": [
      "Hanks",
      " Tom",
      " Karma",
      " Hari",
      " Dinesh",
      "Karma",
      " Tom",
      " Ram",
      " Bindu",
      " Shiva",
      " Kishna",
      " Bikash",
      " Bakshi",
      " Dinesh"
    ]
  }
}

请参阅上的代码https://jsfiddle.net/b6hxswvk/1/创建这个单一对象

如果您希望friendlist是一个二维阵列,例如:

{
  "_id": {
    "user": " Hari",
    "friend": "Shiva"
  },
  "value": {
    "friendList": [
      [
        "Hanks",
        " Tom",
        " Karma",
        " Hari",
        " Dinesh"
      ],
      [
        "Karma",
        " Tom",
        " Ram",
        " Bindu",
        " Shiva",
        " Kishna",
        " Bikash",
        " Bakshi",
        " Dinesh"
      ]
    ]
  }
}

您可以在https://jsfiddle.net/b6hxswvk/2/

您可以简单地执行aggregation,其中可以基于用户朋友字段执行$group

db.collection.aggregate([
{$group:{
_id:{
       user:'$user',
       friend:'$friend'
    },
    friendList:{$push:'$friendList'}
}},
// project the fields as your wish
{$project:{
    user:'$_id.user',
    friend:'$_id.friend',
    friendList:'$friendList'
}}
])

希望这个聚合管道可以返回您期望的结果

朋友,如果map reduce将通过对同一个键的值进行分组并将reduce作为键,list[values]来执行,为什么你要承担对同一键的数据值进行分组的痛苦?

我强烈建议您在reducer中执行分组任务,而不是Map。其背后的主要原因是,由于映射任务逐个记录读取并执行收集操作,因此识别相同密钥组的负担由算法承担,并且如何设计具有分组值的输出可以由我们在归约逻辑中负责

您可以将减速器的输出进行进一步处理。

输入:

{"_id" : {"user" : " Hari","friend" : "Shiva"},
 "value" : {"friendList": ["Hanks"," Tom"," Karma"," Hari"," Dinesh"]}}

 {"_id" : {"user" : "Hari","friend" : " Shiva"},
  "value" : {"friendList" : ["Karma"," Tom"," Ram"," Bindu"," Shiva",
                             " Kishna"," Bikash"," Bakshi"," Dinesh"]}}

Mapreduce代码:

var mapper = function () {
    var key = {"user" : this.user, "friend" : this.friend};
    emit(key, {"value":{"friendList":this.friendList}});
};
var reducer = function(key, value){
var combinedfriendList = {"friendList":[]};
    for (var i in values) {
        var inter = values[i];
        for (var j in inter.friendList) {
            combinedfriendList.friendList.push(inter.friendList[j]);
        }
    }
return {"_id": {"user":key.user, "friend": key.friend}, "value":combinedfriendList};
};

预期输出:

{"_id" : {"user" : " Hari","friend" : "Shiva"},
 "value" : {"friendList": ["Hanks"," Tom"," Karma"," Hari"," Dinesh","Karma"," Tom"," Ram"," Bindu"," Shiva"," Kishna"," Bikash"," Bakshi"," Dinesh"]}}

希望这对你有所帮助。你可以在你的环境中测试它(如果需要,可以更改(并分享你的反馈。

相关内容

  • 没有找到相关文章

最新更新