MongoDB:如何在集合中插入存在于其他集合中的文档?



我有两个集合EN_PR2019EN_PR2018。它们大多包含相同的东西,但来自不同的年份。将所有文档插入EN_PR2019后,我尝试插入可能与集合EN_PR2019具有相同_id的文档。我读到我需要为集合创建一个索引,以便能够在两个不同的集合中具有相同_id的记录。现在我正在pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: Database.EN_PR2018 index: id_1 dup key: { id: null }.

如何在两个不同的集合中插入相同的记录,在两个不同的集合中具有相同的_id,而不会引发错误或处理重复项?

def check_record(collection, record_id):
"""Check if record exists in collection
Args:
record_id (str): record _id as in collection
"""
return collection.find_one({'id': record_id})
def collection_index(collection, index):
"""Checks if index exists for collection, 
and return a new index if not
Args:
collection (str): Name of collection in database
index (str): Dict key to be used as an index
"""
if index not in collection.index_information():
return collection.create_index([(index, pymongo.ASCENDING)], unique=True)
def push_upstream(collection, record_id, record):
"""Update record in collection
Args:
collection (str): Name of collection in database
record_id (str): record _id to be put for record in collection
record (dict): Data to be pushed in collection
"""
return collection.insert_one({"_id": record_id}, {"$set": record})
def update_upstream(collection, record_id, record):
"""Update record in collection
Args:
collection (str): Name of collection in database
record_id (str): record _id as in collection
record (dict): Data to be updated in collection
"""
return collection.update_one({"_id": record_id}, {"$set": record}, upsert=True)
def executePushPlayer(db):
playerstats = load_file(db.playerfile)
collection = db.DATABASE[db.league + db.season]
collection_index(collection, 'id')
for player in playerstats:
existingPost = check_record(collection, player['id'])
if existingPost:
update_upstream(collection, player['id'], player)
else:
push_upstream(collection, player['id'], player)
if __name__ == '__main__':
test = DB('EN_PR', '2018')
executePushPlayer(test)

插入到MongoDB数据库中的每个文档中的_id字段是特殊的,因为_id字段始终被索引并且索引是唯一索引。在一个集合中使用另一个集合中的_id字段是完全合理的,只要新集合中没有违反唯一性约束。

从错误中,我猜您的几个player["_id"]值为空。这表明您的load_file项目中存在一些问题。

最新更新