使用 ES-Spark 连接器创建嵌套对象数组



我有一个架构的Spark数据帧:

|-- ROW_ID: string (nullable = true)
|-- SUBJECT_ID: string (nullable = true)
|-- HADM_ID: string (nullable = true)
|-- CHARTDATE: string (nullable = true)
|-- CHARTTIME: string (nullable = true)
|-- STORETIME: string (nullable = true)
|-- CATEGORY: string (nullable = true)
|-- DESCRIPTION: string (nullable = true)
|-- CGID: string (nullable = true)
|-- ISERROR: string (nullable = true)
|-- TEXT: string (nullable = true)
|-- annotations: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- text: string (nullable = true)
|    |    |-- subject: string (nullable = true)
|    |    |-- polarity: integer (nullable = false)
|    |    |-- confidence: float (nullable = false)
|    |    |-- historyOf: integer (nullable = false)
|    |    |-- ontologyMappings: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- preferredText: string (nullable = true)
|    |    |    |    |-- codingScheme: string (nullable = true)
|    |    |    |    |-- code: string (nullable = true)
|    |    |    |    |-- cui: string (nullable = true)
|    |    |    |    |-- tui: string (nullable = true)

我在 ElasticSearch 中索引整个结构,但注释字段(结构类型数组(和本体映射字段都没有显示为嵌套模式。例如,本体映射映射如下所示:

"ontologyMappings": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"codingScheme": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"cui": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"preferredText": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

有没有办法强制将它们写成嵌套类型,而不仅仅是带有属性字段的对象?我希望能够运行查询来查找包含代码是特定字符串且相关极性为 1 的实例的文档(在本体映射下(。没有嵌套,这种关联是不可能的。

定义嵌套哪些字段的 PUT 请求是必要的。有效负载如下所示。

"""{"mappings":{
"data":{
"properties":{
"annotations":{
"type":"nested",
"properties":{
"ontologyMappings":{
"type":"nested",
"properties":{
"code":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"codingScheme":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"cui":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"preferredText":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"tui":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
}
"""

最新更新