我在MongoDB数据库中有大量带时间戳的文档。每个文档都有一个唯一的标识符。
使用下面的示例文档,我首先想按"updateDate"对集合进行排序,然后为每个包含唯一"域名"的文档检索"uniqueIdentifier"列表。
{
"domainName": "www.example-domain-0.com",
"updateDate": {
"$date": "2013-09-10T19:20:56.652Z"
},
"uniqueIdentifier": "375d7219-828c-4f81-a1fc-3692aa68d110"
}
{
"domainName": "www.example-domain-1.com",
"updateDate": {
"$date": "2013-09-12T19:44:56.833Z"
},
"uniqueIdentifier": "f96bb647-5dcb-4cc1-8a66-105177a45474"
}
{
"domainName": "www.example-domain-0.com",
"updateDate": {
"$date": "2013-09-12T19:10:56.833Z"
},
"uniqueIdentifier": "14f6yu43-20eb-42c6-bb06-26b77c0bf0cb"
}
{
"domainName": "www.example-domain-2.com",
"updateDate": {
"$date": "2013-09-12T19:39:56.833Z"
},
"uniqueIdentifier": "b2a6ae10-20eb-42c6-bb06-26b77c0bf0cb"
}
对于上面的集合,我想得到以下有序结果集:
"f96bb647-5dcb-4cc1-8a66-105177a45474",
"b2a6ae10-20eb-42c6-bb06-26b77c0bf0cb",
"14f6yu43-20eb-42c6-bb06-26b77c0bf0cb"
请注意,未返回"375d7219-828c-4f81-a1fc-3692aa68d110",因为有 2 个文档包含:
"domainName": "www.example-domain-0.com".
在 Java 中实现此目的的最快方法是什么?如果它是一个map-reduce函数,谁能帮助我理解如何用Java编写它?
目前我在 Java 中使用以下方法,但对于大型集合,它的效率非常低:
Map<String, String> domainMap = new HashMap<String, String>();
BasicDBObject restrict = new BasicDBObject("uniqueIdentifier", 1)
.append("domainName", 1);
DBCursor cur = domainCollection.find(null, restrict).sort(
new BasicDBObject("updateDate", -1));
while (cur.hasNext()) {
String id = cur.next().get("uniqueIdentifier").toString();
String domain = cur.next().get("uniqueIdentifier").toString();
if (!domainMap.containsKey(domain)) {
domainMap.put(domain, id);
}
}
cur.close();
尝试聚合框架:
> db.foodle.find()
{ "_id" : ObjectId("52323c61fd99d220e24eef53"), "domainName" : "www.example-domain-0.com", "updateDate" : ISODate("2013-09-12T22:12:49.933Z"), "uniqueIdentifier" : "375d7219-828c-4f81-a1fc-3692aa68d110" }
{ "_id" : ObjectId("52323c64fd99d220e24eef54"), "domainName" : "www.example-domain-1.com", "updateDate" : ISODate("2013-09-12T22:12:52.877Z"), "uniqueIdentifier" : "f96bb647-5dcb-4cc1-8a66-105177a45474" }
{ "_id" : ObjectId("52323c67fd99d220e24eef55"), "domainName" : "www.example-domain-0.com", "updateDate" : ISODate("2013-09-12T22:12:55.550Z"), "uniqueIdentifier" : "14f6yu43-20eb-42c6-bb06-26b77c0bf0cb" }
{ "_id" : ObjectId("52323c6afd99d220e24eef56"), "domainName" : "www.example-domain-2.com", "updateDate" : ISODate("2013-09-12T22:12:58.390Z"), "uniqueIdentifier" : "b2a6ae10-20eb-42c6-bb06-26b77c0bf0cb" }
> db.foodle.aggregate(
... { $sort: { domainName:1, uniqueIdentifier:1 }},
... { $group:{ _id:'$domainName', uniqueIdentifier:{$first:'$uniqueIdentifier'}, thecount:{$sum:1}}},
... { $project:{ _id:0, uniqueIdentifier:1}},
... { $sort: { uniqueIdentifier:1 }}
... )
{
"result" : [
{
"uniqueIdentifier" : "14f6yu43-20eb-42c6-bb06-26b77c0bf0cb"
},
{
"uniqueIdentifier" : "b2a6ae10-20eb-42c6-bb06-26b77c0bf0cb"
},
{
"uniqueIdentifier" : "f96bb647-5dcb-4cc1-8a66-105177a45474"
}
],
"ok" : 1
}
说我的 java 是有限的会很好,但我认为它看起来像这样:
DB db = mongoClient.getDB("test");
DBCollection testCollection = db.getCollection("foodle");
DBObject primarySortFields = new BasicDBObject("domainName", 1);
primarySortFields.put("uniqueIdentifier", 1);
DBObject firstSort = new BasicDBObject("$sort", primarySortFields);
DBObject groupFields = new BasicDBObject("_id", "$domainName");
groupFields.put("uniqueIdentifier", new BasicDBObject("$first","$uniqueIdentifier"));
groupFields.put("thecount", new BasicDBObject("$sum", 1));
DBObject group = new BasicDBObject("$group", groupFields);
DBObject secondSort = new BasicDBObject("$sort", new BasicDBObject("uniqueIdentifier",1));
DBObject fields = new BasicDBObject("_id", 0);
fields.put("uniqueIdentifier", 1);
DBObject project = new BasicDBObject("$project", fields);
AggregationOutput output = testCollection.aggregate(firstSort, group, project, secondSort);
System.out.println(output);
{ "serverUsed" : "/127.0.0.1:27017" , "result" : [ { "uniqueIdentifier" : "14f6yu43-20eb-42c6-bb06-26b77c0bf0cb"} , { "uniqueIdentifier" : "b2a6ae10-20eb-42c6-bb06-26b77c0bf0cb"} , { "uniqueIdentifier" : "f96bb647-5dcb-4cc1-8a66-105177a45474"}] , "ok" : 1.0}