ES dense_vector字段:"调光"必须指定



我有一个ElasticSearch(v7.5.1)索引,其中包含一个名为ldadense_vector字段,具有150个维度。映射如图 http://localhost:9200/documents/_mapping 所示,如下所示:

"documents": {
"mappings": {
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}

当我尝试通过 Elasticsearch Client for Python (v7.1.0) 索引文档时,ES 会抛出以下错误消息:

{"type": "server", "timestamp": "2020-01-03T08:40:04,962Z", "level": "DEBUG", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents][0] failed to execute bulk item
(create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{"id":42129,[...],
"lda":[0.031139032915234566,0.02878846414387226,0.026767859235405922,0.025012295693159103,0.02347283624112606,0.022111890837550163,0.02090011164546013,0.019814245402812958,0.0188356414437294,0.01794915273785591,0.01714235544204712,0.01640496961772442,0.015728404745459557,0.
015105433762073517,0.014529934152960777,0.013996675610542297,0.013501172885298729,0.013039554469287395,0.012608458288013935,0.012204954400658607,0.011826476082205772,0.011470765806734562,0.011135827749967575,0.010819895192980766,0.01052139326930046,0.010238921269774437,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]}", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id":
"M_fMZ3KxQnWP3AiguV1_jA" , 
"stacktrace": ["org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].",                                                                                                            [22/1876]
"at org.elasticsearch.xpack.vectors.mapper.DenseVectorFieldMapper$TypeParser.parse(DenseVectorFieldMapper.java:104) ~[?:?]",                                                                                                                        
"at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromFieldType(DocumentParser.java:680) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:826) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                     
"at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:619) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseNonDynamicArray(DocumentParser.java:601) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                  
"at org.elasticsearch.index.mapper.DocumentParser.parseArray(DocumentParser.java:560) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:420) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                      
"at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                   
"at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:112) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                 
"at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                          
"at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                                 
"at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:791) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-7.5.1.jar:7.5.1]",
[...]

以下是以编程方式将文档添加到索引的方式:

es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)

其中document_data是一个字典,保存上面的错误日志中所示的数据,包括以下内容:

{
[...]
"lda": [0.031139032915234566, ...]
}

索引是在之前立即创建的,因此其中还没有文档。 我注意到,当我创建索引时,有以下输出:

{"type": "server", "timestamp": "2020-01-03T08:40:03,280Z", "level": "INFO", "component": "o.e.c.m.MetaDataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents] creating index, cause [api], 
templates [], shards [1]/[1], mappings [_doc]", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }                                                                                                                                                   
{"type": "deprecation", "timestamp": "2020-01-03T08:40:04,940Z", "level": "WARN", "component": "o.e.d.r.a.d.RestDeleteAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }

索引的创建方式如下:

es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
es.indices.create(index="documents", body=mapping)

其中mapping包含一个字典,用于定义映射,如上面的输出所示:

mappings = {
"mappings": {
"properties": {
[...],
"lda": {
"type": "dense_vector",
"dims": 150
},
}
}
}

更新: 我怀疑mappings确实是问题所在。为没有lda字段的文档编制索引也会失败:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo

因此,我编辑了映射以包含索引名称:

"mappings": {
"document": {    
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}
} 

但是,这会导致映射为空,并在索引文档时推断类型。

--- 结束更新---

我不确定在哪里进行调试。创建索引时的弃用警告似乎可能相关,但我不确定如何解决它。此外,错误消息似乎并没有真正表明这是问题所在。

dense_vector类型的文档没有透露很多细节。但是,此处显示的示例确实有效(使用 cURL 请求)。

通过 Python 从 cURL 方法创建索引的方式之间是否存在功能差异?

我怎样才能找出真正的错误消息是什么;维度是通过dims属性明确定义的。

您正在使用不再支持的 ES 7.xdoc_type-doc - 它也写在索引创建返回的消息中:

[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints

但是您尝试在映射中设置doc_type

es.index(index=self.index, doc_type=doc_type, body=document_data)

从版本 7 开始,您只能将_doc设置为 doc_type,但您尝试设置自己的 -document.这会产生一个错误,并且您的映射被 elastic 拒绝:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have more ...... (my add than one doc_type _doc, document)

要解决您的问题,您只需尝试在创建索引documents期间删除映射中的doc_type - 您的doc_typevar 或mappingvar

相关内容

  • 没有找到相关文章

最新更新