我需要帮助来自动化下面的代码,以便它应该自动从表中选择标题并将所有标题列转换为行,然后需要执行基本聚合(最大值,最小值。等(
任何线索都会有很大帮助。
下面我从表中手动传递列名,从表中选择标题并将其转换为列。
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = sc.parallelize([
Row(name='Alice', age=5, height=80),
Row(name='Mujkesh', age=10, height=90),
Row(name='Ganesh', age=15, height=100)]).toDF()
df.createOrReplaceTempView("Test")
df3 = spark.sql("desc Test" )
df4=df.selectExpr("stack(3,'name',bigint(name),'age',bigint(age),'height',bigint(height)) as (col_name,data)").groupBy(col("col_name")).agg(max(col("data")).alias("Max_val"),min(col("data")).alias("Min_val"))
df5=df3.join(df4,['col_name'],'inner').orderBy("col_name").drop(col("comment"))
df5.show()
Input Data :
+---+------+-------+
|age|height| name|
+---+------+-------+
| 5| 80| Alice|
| 10| 90|Mujkesh|
| 15| 100| Ganesh|
+---+------+-------+
output DATA:
+--------+---------+-------+-------+
|col_name|data_type|Max_val|Min_val|
+--------+---------+-------+-------+
| age| bigint| 15| 5|
| height| bigint| 100| 80|
| name| string| null| null|
+--------+---------+-------+-------+
您可以使用 -获取列的名称 - spark_df.schema.names
您也可以尝试在该特定列上使用透视。
下面的代码是用scala编写的。
scala> df.printSchema
root
|-- name: string (nullable = true)
|-- age: integer (nullable = false)
|-- height: integer (nullable = false)
scala> df.show(false)
+-------+---+------+
|name |age|height|
+-------+---+------+
|Alice |5 |80 |
|Mujkesh|10 |90 |
|Ganesh |15 |100 |
+-------+---+------+
scala>
val expr = array(
df
.schema
.map(c => (c.name,c.dataType.typeName)) // column name & its data type
.map(c =>
struct(
lit(c._1).as("column_name"),
if(c._2 != "string") max(col(c._1)).as("max") else lit(null).cast("int").as("max"),
if(c._2 != "string") min(col(c._1)).as("min") else lit(null).cast("int").as("min"),
lit(c._2).as("datatype")
)
):_*
)
scala> df
.select(expr.as("summary"))
.select(explode($"summary").as("summary"))
.select("summary.*")
.show(false)
+-----------+----+----+--------+
|column_name|max |min |datatype|
+-----------+----+----+--------+
|name |null|null|string |
|age |15 |5 |integer |
|height |100 |80 |integer |
+-----------+----+----+--------+