我需要创建avro文件,但是我需要2件事:
1)JSON
2)AVRO模式
从这两个要求中 - 我有JSON:
{"web-app": {
"servlet": [
{
"servlet-name": "cofaxCDS",
"servlet-class": "org.cofax.cds.CDSServlet",
"init-param": {
"configGlossary:installationAt": "Philadelphia, PA",
"configGlossary:adminEmail": "ksm@pobox.com",
"configGlossary:poweredBy": "Cofax",
"configGlossary:poweredByIcon": "/images/cofax.gif",
"configGlossary:staticPath": "/content/static",
"templateProcessorClass": "org.cofax.WysiwygTemplate",
"templateLoaderClass": "org.cofax.FilesTemplateLoader",
"templatePath": "templates",
"templateOverridePath": "",
"defaultListTemplate": "listTemplate.htm",
"defaultFileTemplate": "articleTemplate.htm",
"useJSP": false,
"jspListTemplate": "listTemplate.jsp",
"jspFileTemplate": "articleTemplate.jsp",
"cachePackageTagsTrack": 200,
"cachePackageTagsStore": 200,
"cachePackageTagsRefresh": 60,
"cacheTemplatesTrack": 100,
"cacheTemplatesStore": 50,
"cacheTemplatesRefresh": 15,
"cachePagesTrack": 200,
"cachePagesStore": 100,
"cachePagesRefresh": 10,
"cachePagesDirtyRead": 10,
"searchEngineListTemplate": "forSearchEnginesList.htm",
"searchEngineFileTemplate": "forSearchEngines.htm",
"searchEngineRobotsDb": "WEB-INF/robots.db",
"useDataStore": true,
"dataStoreClass": "org.cofax.SqlDataStore",
"redirectionClass": "org.cofax.SqlRedirection",
"dataStoreName": "cofax",
"dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
"dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
"dataStoreUser": "sa",
"dataStorePassword": "dataStoreTestQuery",
"dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
"dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
"dataStoreInitConns": 10,
"dataStoreMaxConns": 100,
"dataStoreConnUsageLimit": 100,
"dataStoreLogLevel": "debug",
"maxUrlLength": 500}},
{
"servlet-name": "cofaxEmail",
"servlet-class": "org.cofax.cds.EmailServlet",
"init-param": {
"mailHost": "mail1",
"mailHostOverride": "mail2"}},
{
"servlet-name": "cofaxAdmin",
"servlet-class": "org.cofax.cds.AdminServlet"},
{
"servlet-name": "fileServlet",
"servlet-class": "org.cofax.cds.FileServlet"},
{
"servlet-name": "cofaxTools",
"servlet-class": "org.cofax.cms.CofaxToolsServlet",
"init-param": {
"templatePath": "toolstemplates/",
"log": 1,
"logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
"logMaxSize": "",
"dataLog": 1,
"dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
"dataLogMaxSize": "",
"removePageCache": "/content/admin/remove?cache=pages&id=",
"removeTemplateCache": "/content/admin/remove?cache=templates&id=",
"fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
"lookInContext": 1,
"adminGroupID": 4,
"betaServer": true}}],
"servlet-mapping": {
"cofaxCDS": "/",
"cofaxEmail": "/cofaxutil/aemail/*",
"cofaxAdmin": "/admin/*",
"fileServlet": "/static/*",
"cofaxTools": "/tools/*"},
"taglib": {
"taglib-uri": "cofax.tld",
"taglib-location": "/WEB-INF/tlds/cofax.tld"}}}
但是如何基于它来创建AVRO模式?
寻找程序的方法来做到这一点,因为将有许多模式,并且每次都无法手动创建AVRO模式。
我检查了'avro-tools-1.8.1.jar'
寻找可以创建JSON-> AVRO架构的JAR或Python代码。如果数据类型不是完美的,则可以(字符串,整数和浮子足以启动)。
这个
的简单副本和粘贴。https://toolslick.com/generation/metadata/avro-schema-from-json
您可以使用Kite SDK util从JSON输入中推断Avro架构。
https://github.com/kite-sdk/kite/blob/blob/master/kite-data/kite-data/kite-data-core/src/src/main/java/java/org/kitesdk/data/data/spi/spi/jsonutil.jsonutil.java#l539
示例:
String json = "{n" +
" "id": 1,n" +
" "name": "A green door",n" +
" "price": 12.50,n" +
" "tags": ["home", "green"]n" +
"}n"
;
String avroSchema = JsonUtil.inferSchema(JsonUtil.parse(json), "myschema").toString();
System.out.println(avroSchema);
结果:
{
"type":"record",
"name":"myschema",
"fields":[
{
"name":"id",
"type":"int",
"doc":"Type inferred from '1'"
},
{
"name":"name",
"type":"string",
"doc":"Type inferred from '"A green door"'"
},
{
"name":"price",
"type":"double",
"doc":"Type inferred from '12.5'"
},
{
"name":"tags",
"type":{
"type":"array",
"items":"string"
},
"doc":"Type inferred from '["home","green"]'"
}
]
}
您可以在此处找到Maven依赖关系
射门。http://www.dataedu.ca/avro
它基本上不接受接受JSON的AVRO模式。
您甚至可以给它 JSON数组。它将要做的就是生成与 all 兼容的AVRO模式。
还有其他可以验证结果的工具。
如果要避免为每种JSON格式创建专用AVRO模式,则可以使用rec-avro
软件包。
它允许您采用任何Python数据结构,包括解析的XML或JSON,并将其存储在Avro中,而无需专用模式。
我对Python 3进行了测试。
您可以将其安装为PIP3安装rec-avro,也可以在https://github.com/bmizhen/rec-avro
上查看代码和文档我在此处给了AVRO示例代码的JSON:https://stackoverflow.com/a/55444481/6654219