如何将整个XML数据库引入Elastic Search



假设我有20个XML文件,这就是整个数据库。是否可以将这20个XML文件全部吸收到Elastic Search中?如果是,有什么选项?

对于Python3,我建议使用xmltodict

运行pip install xmltodict elasticsearch

我想xml文件有记录:

<records>
<record>...</record>
...
<record>...</record>
</records>

所以他们必须被拆分成唱片。

编辑一个名为";load.py";具有以下内容:

import sys
import xmltodict
import json
from elasticsearch import Elasticsearch
INDEX="xmlfiles"
TYPE= "record"
def xml_to_actions(xmlcontent):
for record in xmlcontent["records"]:
yield ('{ "index" : { "_index" : "%s", "_type" : "%s" }}'% (INDEX, TYPE))
yield (json.dumps(record, default=int))
e = Elasticsearch()  # no args, connect to localhost:9200
if not e.indices.exists(INDEX):
raise RuntimeError('index does not exists, use `curl -X PUT "localhost:9200/%s"` and try again'%INDEX)
for f in sys.argv:
with open(f, "rt") as fin:
r = e.bulk(xml_to_actions(xmldict.parse(fin)))  # return a dict
print(f, not r["errors"])

python load.py xml1.xml xml2.xml ... xml20.xml一起使用

最新更新