如何在MongoDB中通过MapReduce匹配数组元素来对文档进行分组

我有一个包含字符串数组的列的数据库。示例表:

name | words                          | ...
Ash  | ["Apple", "Pear", "Plum"]      | ...
Joe  | ["Walnut", "Peanut"]           | ...
Max  | ["Pineapple", "Apple", "Plum"] | ...

现在我要将这个表与给定的单词数组进行匹配，并根据匹配率对文档进行分组。

期望结果的示例输入:

// matched for input = ["Walnut", "Peanut", "Apple"]
{
  "1.00": [{name:"Joe", match:"1.00"}],
  "0.33": [{name:"Ash", match:"0.33"}, {name:"Max", match:"0.33"}]
}

我使用以下map函数以匹配速率发出文档作为密钥:

function map() {
    var matches = 0.0;
    for(var i in input) 
      if(this.words.indexOf(input[i]) !== -1) matches+=1;
    matches /= input.length;
    var key = ""+matches.toFixed(2);
    emit(key, {name: this.name, match: key});
}

现在缺少一个匹配的reduce函数来将发射的KV对组合成结果对象。

我试过了:

function reduce(key, value) {
    var res = {};
    res[key] = values;
    return res;
}

但是我对

的规格有疑问

MongoDB可以多次调用同一个reduce函数关键。在本例中，前面的reduce函数的输出为该键将成为下一个reduce的输入值之一键的函数调用

…导致嵌套的结果对象。按匹配项对文档进行分组的正确方法是什么?

对同一个键多次调用reduce函数

这是幂等的，reduce函数必须尊重它

但是，为了简单起见，您只需要确保map的输出格式与reduce的输出格式相同。

对于您的情况，像这样的东西将工作:

db.col.insert({"name": "Ash", "words": ["Apple", "Pear", "Plum"]})
db.col.insert({"name": "Joe", "words": ["Walnut", "Peanut"]})
db.col.insert({"name": "Max", "words": ["Pineapple", "Apple", "Plum"]})
function map() {
    input = ["Walnut", "Peanut", "Apple"]
    var matches = 0.0;
    for(var i in input) 
      if(this.words.indexOf(input[i]) !== -1) matches+=1;
    matches /= input.length;
    var key = ""+matches.toFixed(2);
    emit(key, {users: [{name: this.name, match: key}]});
}
function reduce(key, value) {
    ret = value[0]
    for(var i=1; i<value.length; i++){
        ret.users = ret.users.concat(value[i].users)
    }
    return ret
}
db.col.mapReduce(map, reduce, {"out": {inline:1}})

输出:

{
    "results" : [
        {
            "_id" : "0.33",
            "value" : {
                "users" : [
                    {
                        "name" : "Ash",
                        "match" : "0.33"
                    },
                    {
                        "name" : "Max",
                        "match" : "0.33"
                    }
                ]
            }
        },
        {
            "_id" : "0.67",
            "value" : {
                "users" : [
                    {
                        "name" : "Joe",
                        "match" : "0.67"
                    }
                ]
            }
        }
    ],
    "timeMillis" : 22,
    "counts" : {
        "input" : 3,
        "emit" : 3,
        "reduce" : 1,
        "output" : 2
    },
    "ok" : 1
}

相关内容

最新更新

热门标签：