我在我的sharded集群上运行$geoNear查询(6个节点,每个节点有3个副本集,每个2 shardsvr和1仲裁器)。我期望查询返回110万个文档。我只收到~130。xxx文档。我使用Java驱动程序发出查询并处理数据(目前,我只是计算返回的文档)。我使用MongoDB 3.2.9和最新的java驱动程序。
mongod日志显示以下错误,这是由于输出文档大于16MB引起的:
2016-10-10T12:00:22.933+0200 W COMMAND [conn22] Too many geoNear results for query { location: { $nearSphere: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx] }, $maxDistance: 3900.0 } }, truncating output.
2016-10-10T12:00:22.951+0200 I COMMAND [conn22] command mydb.data command: geoNear { geoNear: "data", near: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx ] },
num: 50000000, maxDistance: 3900.0, query: {}, spherical: true, distanceMultiplier: 1.0, includeLocs: true } keyUpdates:0 writeConflicts:0 numYields:890 reslen:16777310
locks:{ Global: { acquireCount: { r: 1784 } }, Database: { acquireCount: { r: 892 } }, Collection: { acquireCount: { r: 892 } } } protocol:op_query 589ms
2016-10-10T12:00:23.183+0200 I COMMAND [conn22] getmore mydb.data query: { aggregate: "data", pipeline: [ { $geoNear: { near: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx ] },
distanceField: "dist.calculated", limit: 50000000, maxDistance: 3900.0, query: {}, spherical: true, distanceMultiplier: 1.0, includeLocs: "dist.location" } }, { $project: { _id: false,
dist: { calculated: true } } } ], fromRouter: true, cursor: { batchSize: 0 } } cursorid:170255616227 ntoreturn:0 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:43558
reslen:1568108 locks:{ Global: { acquireCount: { r: 1786 } }, Database: { acquireCount: { r: 893 } }, Collection: { acquireCount: { r: 893 } } } 820ms
查询:
db.data.aggregate([
{
$geoNear:{
near:{
type:"Point",
coordinates:[
10.xxxx,
52.xxxxx
]
},
distanceField:"dist.calculated",
maxDistance:3900,
num:50000000,
includeLocs:"dist.location",
spherical:true
}
}
])
请注意,我发出了带有和不带有参数num
的查询,都失败了,出现了上面所示的错误。
我期望查询在超过文档大小限制(16 MB)时返回数据库的块。我错过了什么?如何检索所有数据?
编辑:当我添加分组阶段时,查询也会在mongod日志中失败,并出现相同的错误:
db.data.aggregate([
{
$geoNear:{
near:{
type:"Point",
coordinates:[
10.xxxx,
52.xxxxxx
]
},
distanceField:"dist.calculated",
maxDistance:3900,
includeLocs:"dist.location",
num:2000000,
spherical:true
}
},
{
$group:{
_id:"$root_document"
}
}
])
MongoDB工作人员Lungang Fang在此期间回答了我对MongoDB用户组的询问。以下是他的回答:
目前,"geoNear"聚合阶段仅限于返回在16MB BSON大小限制内的结果。这与MongoDB早期版本的问题(在https://jira.mongodb.org/browse/server - 13486)。你的查询命中了这个问题,因为" geoNear "返回一个文档(包含一个数组)(结果文档)和"allowDiskUse"聚合管道不幸的是,选项在这种情况下没有帮助。
有两个选项可以考虑:
如果你不需要所有的结果,你可以限制"geoNear"使用num、limit或maxDistance选项聚合结果大小如果需要所有的结果,可以使用find()操作符不限于BSON的最大大小,因为它返回一个游标。以下是我在MongoDB 3.2.10上做的测试供您参考。
为指定的集合创建" 2dsphere "
db.coll.createIndex({location: '2dsphere'})
创建并插入几个大文档:var padding = ''; for (var j = 0; j < 15; j++) { for (var i = 1024*128; i > 0; --i) { var padding = padding + '12345678'; } }
db.coll.insert({location:{type:"Point", coordinates:[-73.861, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.862, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.863, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.864, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.865, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.866, 40.73]}, padding:padding}) Query using “geoNear” and server log shows “Too many geoNear results …, truncating output” db.coll.aggregate( [ { $geoNear:{ near:{type:"Point", coordinates:[-73.86, 40.73]}, distanceField:"dist.calculated", maxDistance:150000000, spherical:true } }, {$project: {location:1}} ] ) Query using “find” and all expected documents are returned // This and following "var" are necessary to avoid the screen being flushed by padding string. var cursor = db.coll.find ( { location: { $near: { $geometry:{type:"Point", coordinates:[-73.86, 40.73]}, maxDistance:150000, } } } ) // It is necessary to iterate through the cursor. Otherwise, the query is not actually executed. var x = cursor.next() x._id var x = cursor.next() x._id ...
问候,Lungang