MongoDB复杂子文档查询



我有一个包含>10000个文档的集合,这些文档包含多个嵌套数组。我需要基于位于最低级别的属性进行查询,并只返回数组底部的对象。

文件结构:

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

我一直在使用聚合管道来获得预期的结果,这确实有效,但性能非常糟糕。这是我的问题:

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },
            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

结果是:

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

我可以做些什么来加快查询速度?我尝试过使用索引,但根据Mongo文档,超过初始$match的索引将被忽略。

您还可以尝试在$unwind人员之后向查询添加$match运算符。

...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....

这将减少以下$unwind$match运算符要查询的记录数。

由于你有大量的记录,你可以使用{allowDiskUse:true}选项

允许写入临时文件。设置为true时,聚合stage可以将数据写入dbPath中的_tmp子目录目录

因此,您的最终查询希望:

db.collection.aggregate([
        {
            $match: {
                "type": "employee"
            }
        },
        {$unwind: "$people"},
        {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
        {$unwind: "$people.items"},
        {$match: {$or: [ //There could be dozens of items included in this $match
                         {"people.items.itemName": "RobsItemOne"},
                         {"people.items.itemName": "RobsItemTwo"}
                       ]
                 }
        },
        {
            $project: {
                _id: 0,// This is because of the $out
                systemID: "$_id",
                type: "$type",
                item: "$people.items.itemName",
                value: "$people.items.value"
            }
        }
    ], {allowDiskUse:true})

在@BatScream的努力下,我发现还有一些地方需要改进。你可以试一试。

// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });
var itemCriteria = {
    $in : [ "RobsItemOne", "RobsItemTwo" ]
};
db.collection.aggregate([ {
    $match : {
        "type" : "employee",
        "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
    }
}, {
    $unwind : "$people"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // narrow data range further
    }
}, {
    $unwind : "$people.items"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
    }
}, {
    $project : {
        _id : 0,                                    // This is because of the $out
        systemID : "$_id",
        type : "$type",
        item : "$people.items.itemName",
        value : "$people.items.value"
    }
}, {
    $out: tempCollection                            // optional
} ], {
    allowDiskUse : true
});

最新更新