pymongo 使用排序$elemMatch不起作用



我有一些这样的数据样本:

[
{
"_id": 1,
"host": "host1",
"type": "type1",
"data": [
{
"t": 10000,
"v": 90
},
{
"t": 10001,
"v": 94
},
]
},
{
"_id": 2,
"host": "host1",
"type": "type1",
"data": [
{
"t": 10000,
"v": 99
},
{
"t": 10001,
"v": 93
},
]
},
{
"_id": 3,
"host": "host1",
"type": "type1",
"data": [
{
"t": 10000,
"v": 94
},
{
"t": 10001,
"v": 100
},
]
}]

我的问题是:

my_filter = {'host': 'host1', 'type': 'type1', 'data': {'$elemMatch': {'t': 10000}}}
projection = {'host': 1, 'type': 1, 'data': {'$elemMatch': {'t': 10000}}}
sort_key = 'data.0.v'
rs = db.find(my_filter, projection).sort(sort_key, 1)
rs = list(rs)
for v in rs:
print(v["data"][0]['v'])

但这种输出不起作用:

98
98
98
96
100
98
98

注意:

  • 现在使用:Python=3.6.9,pymongo==3.10.1,MongoDB==4.2.6
  • 文档长度为10000,嵌套数组长度为1440
  1. 我只需要满足Nested Array中条件的数据,而不是全部,因为它可能是一个大数组
  2. 我需要排序数据,但无法更改写入顺序
  3. 我也使用了$aggregate,但当数据很大时,它的性能会很差,所以我希望用find((做一些操作

$aggregate如下:

rs = db.aggregate([
{"$match": {'host': 'host1', 'type': 'type1', 'data': {'$elemMatch': {'t': 10000}}}},
{"$project": {'host': 1, 'type': 1,
'data': {"$filter": {
"input": "$data",
"as": "data",
"cond": {"$eq": ["$$data.t", 10000]}}
}
}},
{"$sort": {'data.0.v': 1}}])

很抱歉我英语不好,但这里有好的解决方案吗?

我无法使用您提到的软件版本重现您的问题。如果你有bash和docker,你可以看看你的结果是否不同:

PROJECT_NAME=sort_with_elemmatch
MONGODB_VERSION=4.2.6
PYTHON_VERSION=3.6.9
PYMONGO_VERSION=3.10.1
docker network create local_temp 2> /dev/null
docker run --rm --network local_temp -d --name mongodb_temp mongo:${MONGODB_VERSION}
cd "$(mktemp -d)" || exit
cat << EOF > requirements.txt
pymongo==${PYMONGO_VERSION}
EOF
cat << 'EOF' > ${PROJECT_NAME}.py
from pymongo import MongoClient
from random import randint
db = MongoClient('mongodb://mongodb_temp')['mydatabase'].mycollection
for i in range(20):
db.insert_one(
{
"host": "host1",
"type": "type1",
"data": [
{
"t": 10000,
"v": randint(0, 100)
},
{
"t": 10001,
"v": randint(0, 100)
},
]
})
my_filter = {'host': 'host1', 'type': 'type1', 'data': {'$elemMatch': {'t': 10000}}}
projection = {'host': 1, 'type': 1, 'data': {'$elemMatch': {'t': 10000}}}
sort_key = 'data.0.v'
rs = db.find(my_filter, projection).sort(sort_key, 1)
rs = list(rs)
for v in rs:
print(v["data"][0]['v'])
EOF
cat << EOF > Dockerfile
FROM python:${PYTHON_VERSION}
COPY ./* /
RUN pip install -r /requirements.txt
CMD ["python", "${PROJECT_NAME}.py"]
EOF
docker build --tag ${PROJECT_NAME}:latest .
docker run --rm --network local_temp --name ${PROJECT_NAME} ${PROJECT_NAME}:latest
docker stop "$(docker ps -a -q --filter name=mongodb_temp)" > /dev/null
docker image rm ${PROJECT_NAME}:latest > /dev/null
docker network rm local_temp > /dev/null

打印:

5
17
18
19
20
28
29
37
59
59
61
63
64
66
68
77
82
82
100
100

我发现了问题,这似乎与嵌入数组中的密钥顺序有关:

for i in range(20):
data_0, data_1 = {"t": 10000, "v": random.randint(0, 100)}, {"t": 10001, "v": random.randint(0, 100)}
insert_d = {
"host": "host1",
"type": "type1",
"data": [data_0, data_1] if i != 10 else [data_1, data_0]
}
db.insert_one(insert_d)
my_filter = {'host': 'host1', 'type': 'type1', 'data': {'$elemMatch': {'t': 10000}}}
projection = {'host': 1, 'type': 1, 'data': {'$elemMatch': {'t': 10000}}}
sort_key = 'data.0.v'
rs = db.find(my_filter, projection).sort(sort_key, 1)
rs = list(rs)
for v in rs:
print(v["data"][0]['v'])

如果你尝试这样做,你会发现排序并不像预期的那样工作,大多数值都是有序的,但有一个是无序的,所以我想知道排序是如何工作的

最新更新