MongoDB 聚合查询与 MySQL SELECT field1 FROM 表

我是MongoDB的新手，想比较NoSQL数据模型相对于其关系数据库对应部分的查询性能。我把它写进了MongoDB shell。

// Make 10 businesses
// Each business has 10 locations
// Each location has 10 departments
// Each department has 10 teams
// Each team has 100 employees
(new Array(10)).fill(0).forEach(_=>
db.businesses.insert({
"name":"Business Name",
"locations":(new Array(10)).fill(0).map(_=>({
"name":"Office Location",
"departments":(new Array(10)).fill(0).map(_=>({
"name":"Department",
"teams":(new Array(10)).fill(0).map(_=>({
"name":"Team Name",
"employees":(new Array(100)).fill(0).map(_=>({
"age":Math.floor(Math.random()*100)
}))
}))
}))
}))
})
);

然后我尝试通过编写以下语句来等效MySQL的EXPLAIN SELECT age,name,(and a few other fields) FROM employees WHERE age >= 50 ORDER BY age DESC：

db.businesses.aggregate([
{ $unwind: "$locations" },
{ $unwind: "$locations.departments" },
{ $unwind: "$locations.departments.teams" },
{ $unwind: "$locations.departments.teams.employees" },
{ $project: { _id: 0, age: "$locations.departments.teams.employees.age" } },
{ $match: { "age": { $gte: 50 }} },
{ $sort: {"age" : -1}}
]).explain("executionStats")

结果是：

"errmsg" ： "排序超出了 104857600 字节的内存限制，但没有选择加入外部排序。正在中止操作。Pass allowDiskUse：true 以选择加入。

所以我删除了排序子句并尝试获得explain. 但结果是：

TypeError： db.business.aggregate(...(.解释不是函数

所以我的问题是：

首先，我想知道与MongoDB的聚合查询对应部分相比，SELECT age FROM employees WHERE age >= 50 ORDER BY age DESC的性能差异。是差不多还是差不多？一个会比另一个更快或更高性能吗？
或者，如何修复我的MongoDB查询，以便我可以获取性能详细信息以与MySQL查询对应部分进行比较？

员工是单个实体;因此，您可能不希望在部门、地点和团队的丰富结构中如此深入地对团队成员age进行建模。拥有一个单独的employees集合并简单地执行以下操作是完全可以的：

db.businesses.aggregate([
{$match: {"age": {$gt: 50} }}
,{$sort: {"age": -1} }
]);

在您的businesses收藏中，您可以拥有：

{ teams: [ {name: "T1", employees: [ "E1", "E34" ]} ] }

或者，尝试以下操作：

db.businesses.aggregate([ your pipeline] ,{allowDiskUse:true});

OP 的设置为 10 个业务 -> 10 个位置 -> 10 个部门 -> 10 个团队 -> 100 个 emps。前 3 次展开会产生 10000 倍的数据爆炸，但最后一个则超出 100 倍。我们可以通过使用$filter来缩小命中：

db.businesses.aggregate([
{ $unwind: "$locations" },
{ $unwind: "$locations.departments" },
{ $unwind: "$locations.departments.teams" },
{$project: {
XX: {$filter: {
input: "$locations.departments.teams.employees",
as: "z",
cond: {$gte: [ "$$z.age", 50] }
}}
}}
,{$unwind: "$XX"}
,{$sort: {"XX.age":-1}}])

你最好$match移动到第一个管道，因为聚合框架在第一个管道之后会丢失索引，我也猜你不需要展开这些数组。

通过修改如下所示的查询，我能够在 1.5 秒内获得没有任何索引的结果：

db.businesses.aggregate([
{
$unwind: "$locations"
},
{
$unwind: "$locations.departments"
},
{
$unwind: "$locations.departments.teams"
},
{
$unwind: "$locations.departments.teams.employees"
},
{
$match: {
"locations.departments.teams.employees.age": {
$gte: 50
}
}
},
{
$project: {
_id: 0,
age: "$locations.departments.teams.employees.age"
}
},
{
$group: {
_id: "$age"
}
},
{
$project: {
_id: 0,
age: "$_id"
}
},
{
$sort: {
"age": - 1
}
}
], {
explain: false
})

还有另一种方法可以解决整体问题，尽管它与 OP 问题不是苹果对苹果。目标是找到所有年龄>= 50 并排序。下面是一个"几乎"这样做并抛出loc,dept,team的示例，以防您也想知道如何获得它，但您可以取出线条以获得emps。现在，这是未排序的 - 但可以提出一个论点，即数据库引擎不会比客户端更好地对此进行排序，并且无论如何都必须通过网络传输所有数据。客户可以使用更复杂的编码技巧来挖掘age字段并对其进行排序。

c = db.foo.aggregate([
{$project: {XX:
{$map: {input: "$locations", as:"z", in:
{$map: {input: "$$z.departments", as:"z2", in:
{$map: {input: "$$z2.teams", as:"z3", in:
{loc: "$$z.name",  // remove if you want
dept: "$$z2.name", // remove if you want
team: "$$z3.name",  // remove if you want
emps: {$filter: {input: "$$z3.employees",
as: "z4",
cond: {$gt: [ "$$z4.age", 50] }
}}
}
}}
}}
}}
}}
]);
ages = [];
c.forEach(function(biz) {
biz['XX'].forEach(function(locs) {
locs.forEach(function(depts) {
depts.forEach(function(teams) {
teams['emps'].forEach(function(emp) {
ages.push(emp['age']);
});
});
});
});
});
print( ages.sort(function(a, b){return b-a}) );
99,98,97,96,95,94,92,92,84,81,78,77,76,72,71,67,66,65,65,64,63,62,62,61,59,59,57,57,57,56,55,54,52,51

在运行MongoDB 4.0的MacBook Pro上，我们看到集合如下：

Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
foo       10   2238682     22386820  4.0    0      16384    0

考虑到 0 到 100 之间的随机年龄，每个 loc/部门/团队的年龄>= 50 并且返回的字节总数约为一半也就不足为奇了。但请注意，设置 agg 的总时间(不返回所有字节(为 ~700 毫秒。

697 millis to agg; 0.697
found 10
tot bytes 11536558

相关内容

最新更新

热门标签：