根据字段字符串中的单词返回文档数



如何返回"单词"列表中具有 2 个以上元素且"word_combination"中具有 3 个以上单词的文档数。 有没有办法计算字符串中的单词数?

示例:返回文档 if ("单词"的长度> 2( AND("words.word_combination"超过 3 个单词(

我存储了许多文档。一个文档的结构如下所示:

"_source" : {
"group_words" : [
{
"amount" : 1140,
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},
{
"amount" : 1320,
"words" : [
{
"relevance_score" : 25,
"points" : 53396,
"bits" : 70,
"word_combination" : "lion elephant"
},
{
"relevance_score" : 66,
"points" : 52166,
"bits" : 20,
"word_combination" : "lion mouse fish cat dog"
},
{
"relevance_score" : 82,
"points" : 49316,
"bits" : 810,
"word_combination" : "elephant cat mouse leopard dog lion"
},
{
"relevance_score" : 87,
"points" : 127705,
"bits" : 290,
"word_combination" : "elephant cat mouse leopard tiger lion"
}
],
"group" : "whatever"
},
{
"amount" : 11260,
"words" : [
{
"relevance_score" : 0,
"points" : 37909,
"bits" : 9000,
"word_combination" : "elephant cat mouse leopard tiger lion monkey"
},
{
"relevance_score" : 3,
"points" : 35782,
"bits" : 540,
"word_combination" : "elephant"
}
],
"group" : "whatever"
}      
]

}

关于words数组中的元素数量,我的建议是在索引时将该数字存储在words_count的额外字段中。

{
"amount" : 1140,
"words_count": 3,                           <--- add this
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},

关于word_combination字段中的单词(或标记(的数量,有一种称为token_count的数据类型正是为此目的而存在的。只需像这样定义映射:

...
"word_combination": {
"type": "text",
"fields": {
"count": {
"type": "token_count",
"analyzer": "standard"
}
}
}

然后在查询中,您可以访问word_combination.count,该word_combination字段中将包含令牌数(由指定的分析器分析(。

最新更新