更改Scala中任何SPARK SQL构造型的所有元素的可取消属性的常见方法



是否有一种通用方法可以为任何指定结构类型的所有元素更改可取消属性?它可能是嵌套的结构型。

我看到@Eliasah将其标记为复制,并使用Spark DataFrame列可将属性更改。但是它们是不同的,因为它无法求解层次结构/嵌套结构类型,因此答案仅在一个级别上。

例如:

 root
 |-- user_id: string (nullable = false)
 |-- name: string (nullable = false)
 |-- system_process: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- timestamp: long (nullable = false)
 |    |    |-- process: string (nullable = false)
 |-- type: string (nullable = false)
 |-- user_process: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- timestamp: long (nullable = false)
 |    |    |-- process: string (nullable = false)

我想将nullalbe更改为所有元素,结果应为:

 root
 |-- user_id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- system_process: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- timestamp: long (nullable = true)
 |    |    |-- process: string (nullable = true)
 |-- type: string (nullable = true)
 |-- user_process: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- timestamp: long (nullable = true)
 |    |    |-- process: string (nullable = true)

附加是构造型的JSON模式的样本,以方便测试:

val jsonSchema="""{"type":"struct","fields":[{"name":"user_id","type":"string","nullable":false,"metadata":{}},{"name":"name","type":"string","nullable":false,"metadata":{}},{"name":"system_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}},{"name":"type","type":"string","nullable":false,"metadata":{}},{"name":"user_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}}]}"""
DataType.fromJson(jsonSchema).asInstanceOf[StructType].printTreeString()

最终发现了两个解决方案:

  1. 技巧一个要先替换字符串,然后从JSON String创建structType实例

    DataType.fromJson(schema.json.replaceAll(""nullable":false", ""nullable":true")).asInstanceOf[StructType]
    
  2. 野外方法

      def updateFieldsToNullable(structType: StructType): StructType = {
        StructType(structType.map(f => f.dataType match {
          case d: ArrayType =>
            val element = d.elementType match {
              case s: StructType => updateFieldsToNullable(s)
              case _ => d.elementType
            }
            f.copy(nullable = true, dataType = ArrayType(element, d.containsNull))
          case s: StructType => f.copy(nullable = true, dataType = updateFieldsToNullable(s))
          case _ => f.copy(nullable = true)
        })
        )
      }
    

相关内容

  • 没有找到相关文章

最新更新