更改嵌套JSON属性的类型

scala> val df = spark.read.json("data.json")
scala> df.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- **TimeStamp: string (nullable = true)**
 |-- id: string (nullable = true)

scala> val df1 = df.withColumn("TimeStamp", $"TimeStamp".cast(TimestampType))
scala> df1.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- **TimeStamp: timestamp (nullable = true)** // WORKING AS EXPECTED
 |-- id: string (nullable = true)

scala> val df2 = df.withColumn("a.b.c", $"a.b.c".cast(DoubleType))
scala> df2.printSchema
root
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: long (nullable = true)
 |-- TimeStamp: string (nullable = true)
 |-- id: string (nullable = true)
 |-- **a.b.c: double (nullable = true)** // DUPLICATE COLUMN ADDED

我正在尝试更改数据框架列中嵌套的JSON属性的类型。嵌套属性的更改已被视为新的列，该列导致重复列。更改对于最高级别属性（时间戳）而不是嵌套的属性（A.B.C）都很好。对这个问题有任何想法吗？

，因为您的列是struct Type＆amp;您需要在同一层次结构中再次构建它。因为它不会假设，所以它认为您正在重写结构。输入：

{"a": {"b": {"c": "1.31", "d": "1.11"}}, "TimeStamp": "2017-02-18", "id":1}
{"a": {"b": {"c": "2.31", "d": "2.22"}}, "TimeStamp": "2017-02-18", "id":1}
val lines2 = spark.read.json("/home/kiran/km/km_hadoop/data/data_nested_struct_col2.json")
lines2.printSchema()
val df2 = lines2.withColumn("a", struct(
                                    struct(
                                        lines2("a.b.c").cast(DoubleType).as("c"),
                                        lines2("a.b.d").as("d")
                                    ).as("b")))
            .withColumn("TimeStamp", lines2("TimeStamp").cast(DateType))
df2.printSchema()

这是两个图式的输出＆amp;之后：

root
 |-- TimeStamp: string (nullable = true)
 |-- a: struct (nullable = true)
 |    |-- b: struct (nullable = true)
 |    |    |-- c: string (nullable = true)
 |    |    |-- d: string (nullable = true)
 |-- id: long (nullable = true)
root
 |-- TimeStamp: date (nullable = true)
 |-- a: struct (nullable = false)
 |    |-- b: struct (nullable = false)
 |    |    |-- c: double (nullable = true)
 |    |    |-- d: string (nullable = true)
 |-- id: long (nullable = true)

我希望这很清楚。

相关内容

最新更新

热门标签：