mongodb中的$addToSet在从AWS获取数据时不能正常工作



我使用AWS的Boto3来获取安全组数据,如安全组名称,端口和协议,入站规则源为0.0.0.0/0 IP,并将它们存储在Mongodb中,格式如下:

{
"_id" : ObjectId("6102702a7f6ee984a6a11bfc"),
"sgname" : "launch-wizard-4",
"socket" : [
{
"ports" : 443,
"portocol" : "tcp"
},
{
"ports" : 22,
"portocol" : "tcp"
},
{
"ports" : 80,
"portocol" : "tcp"
}
]
} 

现在,我使用$addToSet来避免再次运行代码时的重复。但它没有工作,而是再次添加相同的端口和协议(当我运行代码两次以上时不重复):

{
"_id" : ObjectId("6102702a7f6ee984a6a11bfc"),
"sgname" : "launch-wizard-4",
"socket" : [
{
"ports" : 443,
"portocol" : "tcp"
},
{
"ports" : 22,
"portocol" : "tcp"
},
{
"ports" : 80,
"portocol" : "tcp"
},
{
"ports" : 80,
"protocol" : "tcp"
},
{
"ports" : 22,
"protocol" : "tcp"
},
{
"ports" : 443,
"protocol" : "tcp"
}
]
}

同样,当我尝试使用CLI更新时也会发生类似的事情。当我第一次运行cmd时,它正在复制,然后它不是

db.sg.updateOne({"sgname": "launch-wizard-4"}, {$addToSet: {socket: {ports: 443, protocol: "tcp"}}})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
下面是我的代码:
import boto3
import pymongo
import ast

client = boto3.client('ec2')
response = client.describe_security_groups()
client = pymongo.MongoClient('MONGO_URL')
sgname = []
protocol = []
ports = []
datas = []
data = {}
db = client.sg
col = db.sg
cur = col.find()
result = list(cur)
for sg in response['SecurityGroups']:
for ip in sg['IpPermissions']:
for cidr in ip['IpRanges']:
if cidr['CidrIp'] == '0.0.0.0/0':
sgname.append(sg['GroupName'])
sgname = list(set(sgname))
for sg in response['SecurityGroups']:
for ip in sg['IpPermissions']:
for cidr in ip['IpRanges']:
if cidr['CidrIp'] == '0.0.0.0/0':
if sg['GroupName'] in sgname:
log = {
"sgname": sg['GroupName'],
"socket": {
"ports": ip.get('FromPort','missing'),
"portocol": ip['IpProtocol']
}
}
datas.append(log)
sg_data = [ast.literal_eval(el1) for el1 in set([str(el2) for el2 in datas])]
for element in sg_data:
sgname_name = element['sgname']
socket_name = element['socket']
if sgname_name not in data:
data[sgname_name] = []
data[sgname_name].append(socket_name)
new_lst = [{'sgname': key, 'socket': val} for key, val in data.items()] 
if len(result) != 0:
for sg in response['SecurityGroups']:
for ip in sg['IpPermissions']:
for cidr in ip['IpRanges']:
if cidr['CidrIp'] == '0.0.0.0/0':
col.update_many({"sgname": sg['GroupName']}, {"$addToSet": {"socket": {"ports": ip.get('FromPort','missing'), "protocol": ip['IpProtocol']}}})

print("################################")
else:
try:
col.insert_many(new_lst)
except pymongo.errors.BulkWriteError as e:
print(e)

为什么会发生复制

这主要是由于字段排序。

MongoDB比较对象时,字段顺序很重要,所以{port: 80, protocol: "tcp"}{protocol: "tcp", port: 80}不相等。

对于一个有两个字段的对象,你可以得到两个不同的字段顺序,所以你可以得到两个具有相同字段和值的不同对象。

mongod服务器在它接收到的文档中保留字段顺序。

您需要确保客户端每次都以相同的顺序发送字段,以便在对象上匹配。

最新更新