我将摄取附件处理器插件添加到Elastic。
然后创建一个非常简单的pdf文件。
这个文件(内容)我试图注入到Elastic。(参见下面的命令)
但是尝试从文件中找到一个单词失败。(参见命令末尾的第三个答案)
哪里出错了?
我需要添加一些管道吗?
pdf的PUT是否正确,是否需要将pdf内容设置到PUT命令的content字段中?
控制台命令…
1控制台:
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars" : -1
}
}
]
}
1回答:
{
"acknowledged" : true
}
2控制台:
PUT my_index/_doc/001?pipeline=attachment
{
"filename": "C:\ELK-Stack\Test.pdf",
"data": "VGVzdA0KVGVzdCBEb2t1bWVudCB1bWdld2FuZGVsdCB2b24gd28NCkhpZXIgd2lyZCBnZXRlc3RldC4gRGFzIGlzdCBkZXIgVGVzdA==",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Test Test Dokument umgewandelt von word zu pdf. Hier wird getestet. Das ist der Test."
},
"title": "Quick"
}
2答:
{
"_index" : "my_index",
"_id" : "001",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
3控制台:
GET /my_index/_search
{
"query": {
"match": {
"content": "Test"
}
}
}
3的答案:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
4控制台:
GET /_search
{
"query": {
"match_all": {}
}
}
4答:{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "001",
"_score" : 1.0,
"_source" : {
"filename" : """C:ELK-StackTest.pdf""",
"data" : "VGVzdA0KVGVzdCBEb2t1bWVudCB1bWdld2FuZGVsdCB2b24gd28NCkhpZXIgd2lyZCBnZXRlc3RldC4gRGFzIGlzdCBkZXIgVGVzdA==",
"attachment" : {
"content_type" : "text/plain; charset=windows-1252",
"language" : "et",
"content" : """Test
Test Dokument umgewandelt von wo
Hier wird getestet. Das ist der Test""",
"content_length" : 77
},
"title" : "Quick"
}
}
]
}
}
Thanks toLeBigCat我找到解决办法了。
我需要添加完整路径到字段,
使用:"attachment.content";Test">
(而不是"content"Test"
GET /my_index/_search
{
"query": {
"match": {
"attachment.content": "Test"
}
}
}