在使用Elastic Search键入500万个记录集的全文搜索时,如何进行增量/搜索



i m使用弹性搜索在所有Wikipedia文章名称的巨大数据集上,它们在数字数据库字段名称中约为500万个名称是ARADICLENAMES

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "filter":{
            "nGram_filter":{
               "type":"edgeNGram",
               "min_gram":1,    
               "max_gram":20,
               "token_chars":[
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         },
         "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer"
            }
         }
      }
   }
}'

引用这些链接也可以解决我的问题,但徒劳无功

edgrase匹配的边缘ngram

https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf

我的目的是获得以下结果以进行" sachin t"的输入查询

sachin tendulkar
sachin tendulkar centuries
sachin tejas 
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps

和" sachin te"的查询

sachin tendulkar
sachin tendulkar centuries
sachin tejas 

和查询" sachin ta"

sachin talwalkar
sachin tawade
sachin taps

和查询" sachin ten"

sachin tendulkar
sachin tendulkar centuries

请记住,数据集很大

我能够获得较小的数据集的输出,最多可达1万记录,但是一旦我的数据集更改为0.5至500万个记录我无法获得输出

我的查询是

http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50

您应该尝试以下设置:

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}'

查询时,请尝试此查询:

GET my_index/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "Sachin T", 
        "operator": "and"
      }
    }
  }
}

我知道为时已晚,但是任何正在寻找解决方案的人都可以尝试此查询。映射&索引是正确的。在查询部分中似乎缺少和操作员。

GET index_wiki_articlenames/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "sachin ten", 
        "operator": "and"
      }
    }
  }
}

这导致

sachin tendulkar
sachin tendulkar centuries

相关内容

  • 没有找到相关文章

最新更新