透明编码/解码' $ '和'.'时插入/检索文档在MongoDB

我正在研究一个用Python编写的API，该API接受来自客户端的JSON有效负载，应用一些验证并将有效负载存储在MongoDB中，以便它们可以异步处理。

然而，我遇到了一些问题，有效负载(合法地)包括以$和/或包括.开头的键。根据MongoDB文档，我最好的选择是转义这些字符:

在某些情况下，您可能希望使用用户提供的键构建BSON对象。在这些情况下，键需要替换保留的$和.字符。任何字符都是足够的，但考虑使用Unicode全宽等效字符:U+FF04(即" $ ")和U+FF0E(即" . ")。

有道理，但有趣的地方就在这里。我希望这个过程对应用程序是透明的，所以:

检索文档时，密钥不应转义…
…

例如，假设一个(恶意的)用户发送了一个JSON有效负载，其中包含一个像ff04mixed.chars这样的键。当应用程序从存储后端获得此文档时，该键应该转换回ff04mixed.chars，而不是 $mixed.chars。

我最关心的是信息泄露;我不希望有人发现应用程序需要对$和.字符进行特殊处理。坏人可能如何利用MongoDB的方式比我知道如何保护它更好，我不想冒任何风险。

这是我最终采用的方法:

在将文档插入Mongo之前，通过SONManipulator搜索并转义文档中的任何非法密钥。
- 原始密钥作为单独的属性存储在文档中，以便我们以后可以恢复它们。
从Mongo检索文档后，通过SONManipulator运行它以提取原始密钥并恢复它们。

下面是一个简短的例子:

# Example of a document with naughty keys.
document = {
    '$foo': 'bar',
    '$baz': 'luhrmann'
}
##
# Before inserting the document, we must first run it through our
#   SONManipulator.
manipulator = KeyEscaper()
escaped = manipulator.transform_incoming(document, collection.name)
# Now we can insert the document.
document_id = collection.insert_one(escaped).inserted_id
##
# Later, we retrieve the document.
raw = collection.find_one({'_id': document_id})
# Run the document through our KeyEscaper to restore the original
#   keys.
unescaped = manipulator.transform_outgoing(raw, collection.name)
assert unescaped == document

存储在MongoDB中的实际文档是这样的:

{
  "_id": ObjectId('582cebe5cd9b344c814d98e3')
  "__escaped__1": "luhrmann",
  "__escaped__0": "bar",
  "__escaped__": {
    "__escaped__1": ["$baz", {}],
    "__escaped__0": ["$foo", {}]
  }
}

请注意包含原始密钥的__escaped__属性，以便在检索文档时可以恢复它们。

这使得对转义键的查询有点棘手，但这比一开始就不能存储文档要好得多。

包含单元测试和示例用法的完整代码:
https://gist.github.com/todofixthis/79a2f213989a3584211e49bfba582b40

相关内容

最新更新

热门标签：