我有一个包含字符串数组的列的数据库。示例表:
name | words | ...
Ash | ["Apple", "Pear", "Plum"] | ...
Joe | ["Walnut", "Peanut"] | ...
Max | ["Pineapple", "Apple", "Plum"] | ...
现在我要将这个表与给定的单词数组进行匹配,并根据匹配率对文档进行分组。
期望结果的示例输入:
// matched for input = ["Walnut", "Peanut", "Apple"]
{
"1.00": [{name:"Joe", match:"1.00"}],
"0.33": [{name:"Ash", match:"0.33"}, {name:"Max", match:"0.33"}]
}
我使用以下map
函数以匹配速率发出文档作为密钥:
function map() {
var matches = 0.0;
for(var i in input)
if(this.words.indexOf(input[i]) !== -1) matches+=1;
matches /= input.length;
var key = ""+matches.toFixed(2);
emit(key, {name: this.name, match: key});
}
现在缺少一个匹配的reduce
函数来将发射的KV对组合成结果对象。
我试过了:
function reduce(key, value) {
var res = {};
res[key] = values;
return res;
}
但是我对
的规格有疑问MongoDB可以多次调用同一个reduce函数关键。在本例中,前面的reduce函数的输出为该键将成为下一个reduce的输入值之一键的函数调用
…导致嵌套的结果对象。按匹配项对文档进行分组的正确方法是什么?
对同一个键多次调用reduce函数
这是幂等的,reduce函数必须尊重它
但是,为了简单起见,您只需要确保map的输出格式与reduce的输出格式相同。
对于您的情况,像这样的东西将工作:
db.col.insert({"name": "Ash", "words": ["Apple", "Pear", "Plum"]})
db.col.insert({"name": "Joe", "words": ["Walnut", "Peanut"]})
db.col.insert({"name": "Max", "words": ["Pineapple", "Apple", "Plum"]})
function map() {
input = ["Walnut", "Peanut", "Apple"]
var matches = 0.0;
for(var i in input)
if(this.words.indexOf(input[i]) !== -1) matches+=1;
matches /= input.length;
var key = ""+matches.toFixed(2);
emit(key, {users: [{name: this.name, match: key}]});
}
function reduce(key, value) {
ret = value[0]
for(var i=1; i<value.length; i++){
ret.users = ret.users.concat(value[i].users)
}
return ret
}
db.col.mapReduce(map, reduce, {"out": {inline:1}})
输出:{
"results" : [
{
"_id" : "0.33",
"value" : {
"users" : [
{
"name" : "Ash",
"match" : "0.33"
},
{
"name" : "Max",
"match" : "0.33"
}
]
}
},
{
"_id" : "0.67",
"value" : {
"users" : [
{
"name" : "Joe",
"match" : "0.67"
}
]
}
}
],
"timeMillis" : 22,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}