给定一个用Spark SQL(2.4.*(以这种方式构建的表:
scala> spark.sql("with some_data (values ('A',1),('B',2)) select * from some_data").show()
+----+----+
|col1|col2|
+----+----+
| A| 1|
| B| 2|
+----+----+
我无法设置列名(实际上是默认的col1
和col2
(。有没有办法将这些列重命名为label
和value
?
将查询修改为-
spark.sql("with some_data (values ('A',1),('B',2) T(label, value)) select * from some_data").show()
/**
* +-----+-----+
* |label|value|
* +-----+-----+
* | A| 1|
* | B| 2|
* +-----+-----+
*/
或使用此示例作为参考-
val df = spark.sql(
"""
|select Class_Name, Customer, Date_Time, Median_Percentage
|from values
| ('ClassA', 'A', '6/13/20', 64550),
| ('ClassA', 'B', '6/6/20', 40200),
| ('ClassB', 'F', '6/20/20', 26800),
| ('ClassB', 'G', '6/20/20', 18100)
| T(Class_Name, Customer, Date_Time, Median_Percentage)
""".stripMargin)
df.show(false)
df.printSchema()
/**
* +----------+--------+---------+-----------------+
* |Class_Name|Customer|Date_Time|Median_Percentage|
* +----------+--------+---------+-----------------+
* |ClassA |A |6/13/20 |64550 |
* |ClassA |B |6/6/20 |40200 |
* |ClassB |F |6/20/20 |26800 |
* |ClassB |G |6/20/20 |18100 |
* +----------+--------+---------+-----------------+
*
* root
* |-- Class_Name: string (nullable = false)
* |-- Customer: string (nullable = false)
* |-- Date_Time: string (nullable = false)
* |-- Median_Percentage: integer (nullable = false)
*/
请遵守T(Class_Name, Customer, Date_Time, Median_Percentage)
,根据要求向列提供名称