从mongodb数据的文档中删除https / https



如何在MongoDB聚合中从tags.Domain的开头删除http://https://,从末尾删除'/'

示例文档:

{
"_id" : ObjectId("5d9f074f5833c8cd1f685e05"),
"tags" : [
{
"Domain" : "http://www.google.com",
"rank" : 1
},
{
"Domain" : "https://www.stackoverflow.com/",
"rank" : 2
}
]
}

假设标签中的字段将包含具有有效附加和前缀的有效URL(https,http,//,/,com/,org/,/in(

  • $trim运算符用于从tags.Domain中删除https://http:///

注意:这不适用于已格式化且开头/结尾不包含这些字符的 URL。示例:'hello.com' would become 'ello.com''xyz.ins' would become 'xyz.in'等。

聚合查询

db.collection.aggregate([
{
$addFields:{
"tags":{
$map:{
"input":"$tags",
"as":"tag",
"in":{
$mergeObjects:[
"$$tag",
{
"Domain":{ 
$trim: { 
"input": "$$tag.Domain", 
"chars": "https://" 
} 
}
}
]
}
}
}
}
}    
]).pretty()

输出:(演示(

{
"_id" : 2, //ObjectId
"tags" : [
{
"rank" : 1,
"Domain" : "www.google.com"
},
{
"rank" : 2,
"Domain" : "www.stackoverflow.com"
}
]
}

解决方案最终比我预期的要长(我希望有人能找到更简洁的解决方案(,但在这里你去:

db.test.aggregate([
{$unwind:"$tags"}, //unwind tags so that we can separately deal with http and https
{
$facet: { 
"https": [{ // the first stage will...
$match: { // only contain documents...
"tags.Domain": /^https.*/ // that are allowed by the match the regex /^https.*/
}
}, {
$addFields: { // for all matching documents...
"tags.Domain": {"$substr": ["$tags.Domain",8,-1]} // we change the tags.Domain field to required substring (skip 8 characters and go on till the last character)
}
}],
"http": [{ // similar as above except we're doing the inverse filter using $not
$match: {
"tags.Domain": { $not: /^https.*/ }
}
}, {
$addFields: { // for all matching documents...
"tags.Domain": {"$substr": ["$tags.Domain",7,-1]} // we change the tags.Domain field to required substring (skip 7 characters and go on till the last character)
}
}
]
}
},
{ $project: { all: { $concatArrays: [ "$https", "$http" ] } } }, //we have two arrays at this point, so we just concatenate them both to have one array called "all"
//unwind and group the array by _id to get the document back in the original format
{$unwind: "$all"}, 
{$group: {
_id: "$all._id",
tags: {$push: "$all.tags"}
}}
])

要从末尾删除/,您可以使用与 url 匹配的正则表达式的另一个方面(类似于/.*/$/应该有效(,并在 concat 中使用该方面。

在帮助下:https://stackoverflow.com/a/49660098/5530229 和 https://stackoverflow.com/a/44729563/5530229

正如 dnickless 在上面提到的第一个答案中所说,与聚合框架一样,从管道末尾删除各个阶段并运行部分查询可能会有所帮助,以便了解每个阶段的作用。

最新更新