通过java代码在elasticsearch中使用inguest-attachment插件索引pdf / word



我试图索引我的word/pdf文档,以便我使用java创建了一个util程序将我的文件编码为base64,然后尝试在ElasticSearch中索引它们。

请在下面找到我的代码,我可以将我的文件编码为 base64。现在,我不确定如何在 ElasticSearch 中索引它们

请在下面找到我的 java 代码。

public static void main(String args[]) throws IOException {
String filePath = "D:\\1SearchEngine\testing.pdf";
String encodedfile = null;
RestHighLevelClient restHighLevelClient = null;
File file = new File(filePath);
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
//System.out.println(encodedfile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (final Exception e) {
System.out.println("Error closing ElasticSearch client: ");
}
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
IndexRequest request = new IndexRequest( "attach_local", "doc", "103");   
Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("resume", "Karthikeyan");
jsonMap.put("postDate", new Date());
jsonMap.put("resume", encodedfile);
try {
IndexResponse response = restHighLevelClient.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}
}

我使用的是 ElasticSearch 6.2.3 版本,并且我已经安装了采集附件插件版本 6.3.0

我为ElasticSearch Client使用以下依赖项

<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>6.1.2</version>
</dependency>

请查找我的映射详细信息

PUT attach_local
{
"mappings" : {
"doc" : {
"properties" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "binary"
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "text"
},
"language" : {
"type" : "text"
}
}
},
"resume" : {
"type" : "text"
}
}
}
}
}
PUT _ingest/pipeline/attach_local
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "resume"
}
}
]
}

现在我在创建索引时从 java 收到以下错误

Exception in thread "main" org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: source is missing;2: content type is missing;
at org.elasticsearch.action.ValidateActions.addValidationError(ValidateActions.java:26)
at org.elasticsearch.action.index.IndexRequest.validate(IndexRequest.java:153)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:436)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:429)
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:312)
at com.es.utility.DocumentIndex.main(DocumentIndex.java:82)

最后,我得到了解决方案,如何通过Java API在ElasticSearch中索引PDF/WORD文档

String filePath = "D:\\1SearchEngine\testing.pdf";
String encodedfile = null;
RestHighLevelClient restHighLevelClient = null;
File file = new File(filePath);
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (final Exception e) {
System.out.println("Error closing ElasticSearch client: ");
}
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}

Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("Name", "Karthikeyan");
jsonMap.put("postDate", new Date());
jsonMap.put("resume", encodedfile);
IndexRequest request = new IndexRequest("attach_local", "doc", "104")
.source(jsonMap)
.setPipeline("attach_local");
try {
IndexResponse response = restHighLevelClient.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}

映射详情 :

PUT attach_local
{
"mappings" : {
"doc" : {
"properties" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "binary"
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "text"
},
"language" : {
"type" : "text"
}
}
},
"resume" : {
"type" : "text"
}
}
}
}
}

PUT _ingest/pipeline/attach_local
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "resume"
}
}
]
}

最新更新