如何在WITH和VALUES中重命名Spark SQL中的列



给定一个用Spark SQL(2.4.*(以这种方式构建的表:

scala> spark.sql("with some_data (values ('A',1),('B',2)) select * from some_data").show()
+----+----+
|col1|col2|
+----+----+
|   A|   1|
|   B|   2|
+----+----+

我无法设置列名(实际上是默认的col1col2(。有没有办法将这些列重命名为labelvalue

将查询修改为-

spark.sql("with some_data (values ('A',1),('B',2) T(label, value)) select * from some_data").show()
    /**
      * +-----+-----+
      * |label|value|
      * +-----+-----+
      * |    A|    1|
      * |    B|    2|
      * +-----+-----+
      */

或使用此示例作为参考-

val df = spark.sql(
      """
        |select Class_Name, Customer, Date_Time, Median_Percentage
        |from values
        |   ('ClassA', 'A', '6/13/20', 64550),
        |   ('ClassA', 'B', '6/6/20', 40200),
        |   ('ClassB', 'F', '6/20/20', 26800),
        |   ('ClassB', 'G', '6/20/20', 18100)
        |  T(Class_Name, Customer, Date_Time, Median_Percentage)
      """.stripMargin)
    df.show(false)
    df.printSchema()
    /**
      * +----------+--------+---------+-----------------+
      * |Class_Name|Customer|Date_Time|Median_Percentage|
      * +----------+--------+---------+-----------------+
      * |ClassA    |A       |6/13/20  |64550            |
      * |ClassA    |B       |6/6/20   |40200            |
      * |ClassB    |F       |6/20/20  |26800            |
      * |ClassB    |G       |6/20/20  |18100            |
      * +----------+--------+---------+-----------------+
      *
      * root
      * |-- Class_Name: string (nullable = false)
      * |-- Customer: string (nullable = false)
      * |-- Date_Time: string (nullable = false)
      * |-- Median_Percentage: integer (nullable = false)
      */

请遵守T(Class_Name, Customer, Date_Time, Median_Percentage),根据要求向列提供名称

最新更新