我有一个ElasticSearch(v7.5.1)索引,其中包含一个名为lda
的dense_vector
字段,具有150个维度。映射如图 http://localhost:9200/documents/_mapping 所示,如下所示:
"documents": {
"mappings": {
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}
当我尝试通过 Elasticsearch Client for Python (v7.1.0) 索引文档时,ES 会抛出以下错误消息:
{"type": "server", "timestamp": "2020-01-03T08:40:04,962Z", "level": "DEBUG", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents][0] failed to execute bulk item
(create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{"id":42129,[...],
"lda":[0.031139032915234566,0.02878846414387226,0.026767859235405922,0.025012295693159103,0.02347283624112606,0.022111890837550163,0.02090011164546013,0.019814245402812958,0.0188356414437294,0.01794915273785591,0.01714235544204712,0.01640496961772442,0.015728404745459557,0.
015105433762073517,0.014529934152960777,0.013996675610542297,0.013501172885298729,0.013039554469287395,0.012608458288013935,0.012204954400658607,0.011826476082205772,0.011470765806734562,0.011135827749967575,0.010819895192980766,0.01052139326930046,0.010238921269774437,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]}", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id":
"M_fMZ3KxQnWP3AiguV1_jA" ,
"stacktrace": ["org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].", [22/1876]
"at org.elasticsearch.xpack.vectors.mapper.DenseVectorFieldMapper$TypeParser.parse(DenseVectorFieldMapper.java:104) ~[?:?]",
"at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromFieldType(DocumentParser.java:680) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:826) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:619) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseNonDynamicArray(DocumentParser.java:601) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseArray(DocumentParser.java:560) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:420) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:112) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:791) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-7.5.1.jar:7.5.1]",
[...]
以下是以编程方式将文档添加到索引的方式:
es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)
其中document_data
是一个字典,保存上面的错误日志中所示的数据,包括以下内容:
{
[...]
"lda": [0.031139032915234566, ...]
}
索引是在之前立即创建的,因此其中还没有文档。 我注意到,当我创建索引时,有以下输出:
{"type": "server", "timestamp": "2020-01-03T08:40:03,280Z", "level": "INFO", "component": "o.e.c.m.MetaDataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents] creating index, cause [api],
templates [], shards [1]/[1], mappings [_doc]", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA" }
{"type": "deprecation", "timestamp": "2020-01-03T08:40:04,940Z", "level": "WARN", "component": "o.e.d.r.a.d.RestDeleteAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA" }
索引的创建方式如下:
es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
es.indices.create(index="documents", body=mapping)
其中mapping
包含一个字典,用于定义映射,如上面的输出所示:
mappings = {
"mappings": {
"properties": {
[...],
"lda": {
"type": "dense_vector",
"dims": 150
},
}
}
}
更新: 我怀疑mappings
确实是问题所在。为没有lda
字段的文档编制索引也会失败:
RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo
因此,我编辑了映射以包含索引名称:
"mappings": {
"document": {
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}
}
但是,这会导致映射为空,并在索引文档时推断类型。
--- 结束更新---
我不确定在哪里进行调试。创建索引时的弃用警告似乎可能相关,但我不确定如何解决它。此外,错误消息似乎并没有真正表明这是问题所在。
dense_vector
类型的文档没有透露很多细节。但是,此处显示的示例确实有效(使用 cURL 请求)。
通过 Python 从 cURL 方法创建索引的方式之间是否存在功能差异?
我怎样才能找出真正的错误消息是什么;维度是通过dims
属性明确定义的。
您正在使用不再支持的 ES 7.xdoc_type
-doc - 它也写在索引创建返回的消息中:
[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints
但是您尝试在映射中设置doc_type
:
es.index(index=self.index, doc_type=doc_type, body=document_data)
从版本 7 开始,您只能将_doc
设置为 doc_type,但您尝试设置自己的 -document
.这会产生一个错误,并且您的映射被 elastic 拒绝:
RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have more ...... (my add than one doc_type _doc, document)
要解决您的问题,您只需尝试在创建索引documents
期间删除映射中的doc_type - 您的doc_type
var 或mapping
var