如何查询密集向量场在elasticsearch与python客户端?



这是我第一次使用ElasticSearch和Python客户端。我对设置query_body来查询密集向量场有点困惑。以下是我到目前为止所做的步骤。请帮我创建查询体,我可以在我的搜索功能中使用。

from elasticsearch import Elasticsearch, helpers
from sentence_transformers import SentenceTransformer, util
embedder = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
with open('my_folder/my_docs.json', 'r') as file:
documents = json.load(file)
#STEP 1: Embedding documents
for d in documents:
d['vector']= embedder.encode(d['content'], convert_to_tensor=True) 
d['vector'] = d['vector'].numpy()
#STEP 2: Defining Mapping Dictionary
mapping = {
"mappings": {
"properties": {
"name": {
"type": "text" 
},
"content": {
"type": "text"
},
"doc_vector": {
"type": "dense_vector",
"dims": 768
}
}
}
}
#STEP 3: Creating the Client
client = Elasticsearch("http://localhost:9200")
# STEP 4: Creating Index
response = client.indices.create(
index="my_doc_dense_index",
body=mapping,
ignore=400 # ignore 400 already exists code
)
# STEP 5: Bulk Uploading docs to Index
resp = helpers.bulk(
client,
documents,
index = 'my_doc_dense_index')

#STEP 6: Example Query
query = 'Who is the tennis champion in women''s tennis?'
#STEP 7: Encoding Query
encoded_query = embedder.encode([query])
#STEP 8: Setting up query body with encoded query
query_body = ???????
#STEP 9: submit a search query to ElasticSearch
docs = client.search(body = query_body, index="my_doc_dense_index", size=10)

从步骤1到步骤7的所有代码都运行良好。我需要帮助构建第8步的密集向量查询,以便我可以在第9步中使用它。谁来帮帮我。

提前感谢,凯

您可以使用knn选项在searchknn_search方法中传递带有dense_vector的查询对象

from elasticsearch import Elasticsearch
es = Elasticsearch()
my_vector = [0.5, 0.3, 0.2]
query_string = {
"field": "my_dense_vector_field",
"query_vector": my_vector,
"k": 10,
"num_candidates": 100
}
# run the query
results = es.search(index="my_index", knn=query_string)

查看ES官方文档中的search和knn_search

最新更新