在 Python 中根据匹配键值对和比较日期获取记录



我在MongoDB中有一个以下集合:

{
"_id" : ObjectId("5bbc86e5c16a27f1e1bd39f8"),
"name" : "swetha",
"nameId" : 123,
"source" : "Blore",
"sourceId" : 10,
"LastUpdate" : "10-Oct-2018"
}
{
"_id" : ObjectId("5bbc86e5c16a27f1e1bd39f9"),
"name" : "swetha",
"nameId" : 123,
"source" : "Mlore",
"sourceId" : "11",
"LastUpdate" : "11-Oct-2018"
}
{
"_id" : ObjectId("5bbc86e5c16a27f1e1bd39fa"),
"name" : "swathi",
"nameId" : 124,
"source" : "Mlore",
"sourceId" : "11",
"LastUpdate" : "9-Oct-2018"
}

我是 Python 的初学者,想根据匹配的"name"或">nameId"来比较上述记录之间的"LastUpdate",并希望将具有最新日期记录推送到另一个集合。 例如,名称:"Swetha">在前两条记录中是相同的。因此,比较它们之间的"上次更新">并输出具有最新日期的记录。

我编写了以下代码来从MongoDB读取数据记录并进行打印。我不明白如何比较同一键中的记录并比较它们的时间戳,尽管我在 Google 上引用了很少的资源。

import json
import pandas as pd
from pymongo import MongoClient
try: 
client = MongoClient() 
print("Connected successfully!!!") 
except:   
print("Could not connect to MongoDB") 
# database 
db = client.conn
collection = db.contactReg
df = collection.find()
for row in df:
print(row)

引用的链接

有没有更好的方法来比较字典值

https://gis.stackexchange.com/questions/87276/how-to-compare-values-from-a-column-in-attribute-table-with-values-in-dictionary

比较两个字典并在python中打印键值对等等。

我认为你需要的是一个聚合。这可能看起来很大,但一旦你摆脱了 mongo 聚合,你就会感到舒服。

df = collection.aggregate([
{
"$project": {
"_id": 0,
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1,
"LastUpdateArray": {
"$split": [
"$LastUpdate",
"-"
]
}
}
},
{
"$project": {
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1,
"LastUpdateArray": 1,
"LastUpdateMonth": {
"$arrayElemAt": [
"$LastUpdateArray",
1
]
}
}
},
{
"$project": {
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1,
"Year": {
"$arrayElemAt": [
"$LastUpdateArray",
2
]
},
"Date": {
"$arrayElemAt": [
"$LastUpdateArray",
0
]
},
"Month": {
"$switch": {
"branches": [
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Jan"
]
},
"then": "01"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Feb"
]
},
"then": "02"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Mar"
]
},
"then": "03"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Apr"
]
},
"then": "04"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"May"
]
},
"then": "05"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Jun"
]
},
"then": "06"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Jul"
]
},
"then": "07"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Aug"
]
},
"then": "08"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Sep"
]
},
"then": "09"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Oct"
]
},
"then": "10"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Nov"
]
},
"then": "11"
},
{
"case": {
"$eq": [
"$LastUpdateMonth",
"Dec"
]
},
"then": "12"
}
],
"default": "01"
}
}
}
},
{
"$project": {
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1,
"Year": 1,
"Date": 1,
"Month": 1,
"DateString": {
"$concat": [
"$Year",
"-",
"$Month",
"-",
"$Date"
]
}
}
},
{
"$project": {
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1,
"Date": {
"$dateFromString": {
"dateString": "$DateString"
}
}
}
},
{
"$sort": {
"Date": -1
}
},
{
"$group": {
"_id": "$name",
"name": {
"$first": "$name"
},
"nameId": {
"$first": "$nameId"
},
"source": {
"$first": "$source"
},
"sourceId": {
"$first": "$sourceId"
},
"LastUpdate": {
"$first": "$LastUpdate"
},
"Date": {
"$first": "$Date"
}
}
},
{
"$project": {
"name": 1,
"nameId": 1,
"source": 1,
"sourceId": 1,
"LastUpdate": 1
}
}
])

聚合的前 5 个步骤,我尝试将其转换为日期,然后按日期降序排序。在分组中,我按名称分组,并取了该名称附带的第一个。

希望这有帮助。

我假设你需要的是重复的记录,我正在接受第一个。参考 : https://stackoverflow.com/a/26985011/7630071

df = collection.aggregate([
{
"$group": {
"_id": "$name",
"count": {
"$sum": 1
},
"data": {
"$push": {
"nameId": "$nameId",
"source": "$source",
"sourceId": "$sourceId",
"LastUpdate": "$LastUpdate"
}
}
}
},
{
"$match": {
"_id": {
"$ne": null
},
"count": {
"$gt": 1
}
}
}
])

最新更新