我在 ElasticSearch 中PUT
以下文档:
{
"_rootId": "327d3aba-4f7c-4abb-9ff3-b1608c354c7c",
"_docId": "ID_3",
"_ver": 0,
"val_labels": [
"x1",
"x1",
"x1"
]
}
然后,我GET
以下查询,该查询使用painless
脚本进行排序:
{
"query": {
"bool": {
"must": [
{
"term": {
"_rootId": "77394e08-32be-4611-bbf7-818dfe4bc853"
}
}
]
}
},
"sort": [
{
"_script": {
"order": "desc",
"type": "string",
"script": {
"lang": "painless",
"source": "return doc['val_labels'].toString()"
}
}
}
]
}
这是我收到的回复:
{
"took": 30,
"timed_out": false,
"_shards": {
"total": 12,
"successful": 12,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "my-index",
"_type": "views",
"_id": "77394e08-32be-4611-bbf7-818dfe4bc853.ID_3",
"_score": null,
"_source": {
"_rootId": "77394e08-32be-4611-bbf7-818dfe4bc853",
"_docId": "ID_3",
"_ver": 0,
"val_labels": [
"x1",
"x1",
"x1"
]
},
"sort": [
"[x1]"
]
}
]
}
}
奇怪的是,响应中的val_labels
字段显示["x1", "x1", "x1"]
(正如预期的那样,请参阅插入的对象(,而sort
字段仅显示单个x1
值。
对此有什么解释吗?
结果
中的字段_source
是原始未修改的文档,而排序脚本正在访问文档值doc['val_labels']
这些字段是已处理的字段。这可以通过显式获取docvalue_fields
来调试:
{
"query": {
"match_all": {}
},
"docvalue_fields": [
"val_labels"
]
}
这会产生以下命中(我只索引了一个文档(
{
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "ID_3",
"_score": 1,
"_source": {
"val_labels": [
"x1",
"x1",
"x1"
]
},
"fields": {
"val_labels": [
"x1"
]
}
}
]
}
请注意结果中已消除重复数据的值。这是因为多个相同的值会导致术语频率增加
GET /test/_doc/ID_3/_termvectors?fields=val_labels
{
"term_vectors": {
"val_labels": {
"field_statistics": {
"sum_doc_freq": 1,
"doc_count": 1,
"sum_ttf": -1
},
"terms": {
"x1": {
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 2
},
{
"position": 1,
"start_offset": 3,
"end_offset": 5
},
{
"position": 2,
"start_offset": 6,
"end_offset": 8
}
]
}
}
}
}
}