模式分析器无法在弹性搜索中使用UUID



我使用的是elasticsearch 7.x版本,并使用以下映射创建了一个帐户索引。

curl --location --request PUT 'http://localhost:9200/accounts' 
--header 'Content-Type: application/json' 
--data-raw '{
"mappings": {
"properties": {
"type": {"type": "keyword"},
"id": {"type": "keyword"},
"label": {"type": "keyword"},
"lifestate": {"type": "keyword"},
"name": {"type": "keyword"},
"users": {"type": "text"}
}
}
}'

我将用户存储为一个数组。在我的用例中,一个帐户可以有n个用户。所以我将它存储为以下格式。

curl --location --request PUT 'http://localhost:9200/accounts/_doc/account3' 
--header 'Content-Type: application/json' 
--data-raw '{
"id" : "account_uuid",
"name" : "Account_Description",
"users" : [
"id:6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
"id:9611e2be-784f-4a07-b5de-564b3820a660~~status:INACTIVE"
]
}'

为了根据用户ID及其状态进行搜索,我创建了一个模式分析器,它按~~符号进行拆分,如下所示。

curl --location --request PUT 'http://localhost:9200/accounts/_settings' 
--header 'Content-Type: application/json' 
--data-raw '{
"settings": {
"analysis": {
"analyzer": {
"p_analyzer": { 
"type": "pattern",
"pattern" :"~~"
}
}
}
}
}'

搜索查询调用是

curl --location --request GET 'http://localhost:9200/accounts/_search' 
--header 'Content-Type: application/json' 
--data-raw '{
"query": {
"bool": {
"filter": [ 
{ "term": {"id": "account_uuid"} },
{ "match" : {"users" : {
"query" : "id:<user_id>",
"analyzer" : "p_analyzer"
}}}
]   
}
}
}'

如果userid格式是纯字符串,那么这确实有效。也就是说,如果用户id以非UUID格式存储,那么它运行良好。但它不适用于UUID格式的id。如何使其发挥作用?

修改分析器以包含-hypen,该hypen在为UUID创建令牌时应能解决您的问题。

{
"settings": {
"analysis": {
"analyzer": {
"p_analyzer": {
"type":      "pattern",
"pattern":   "~~|-",  --> note hypen is included `-`
"lowercase": true
}
}
}
}
}

使用上述分析器生成以下令牌

POST/您的索引/_analyze

{
"text" : "6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
"analyzer" : "my_email_analyzer"
}

生成的代币

{
"tokens": [
{
"token": "6de57db5",
"start_offset": 0,
"end_offset": 8,
"type": "word",
"position": 0
},
{
"token": "8fdb",
"start_offset": 9,
"end_offset": 13,
"type": "word",
"position": 1
},
{
"token": "4a39",
"start_offset": 14,
"end_offset": 18,
"type": "word",
"position": 2
},
{
"token": "ab46",
"start_offset": 19,
"end_offset": 23,
"type": "word",
"position": 3
},
{
"token": "21af623692ea",
"start_offset": 24,
"end_offset": 36,
"type": "word",
"position": 4
},
{
"token": "status:active",
"start_offset": 38,
"end_offset": 51,
"type": "word",
"position": 5
}
]
}

现在搜索6de57db5-8fdb-4a39-ab46-21af623692ea会将其分解为6de57db58fdb4a39,依此类推,并且会匹配索引时生成的令牌,并出现在搜索结果中。

最新更新