如何在PySpark数据框中添加字符串值+另一列的值



输入数据帧:原始数据帧

class  id                            name
System   0                          System
Generator   1                        Coal_Gen

预期输出:新列'Index'值"ST "+ class value + "(id value)">

class    id                            name               Index
System   0                          System        ST System(0)
Generator   1                        Coal_Gen     ST Generator(1)

Try withconcatspark中的功能

Example:

df.show()
#+---------+---+--------+
#|    class| id|    name|
#+---------+---+--------+
#|   System|  0|  System|
#|Generator|  1|Coal_Gen|
#+---------+---+--------+
from pyspark.sql.functions import *
df.withColumn("index",concat(lit("ST"),lit(" "), col("class"),lit("("),col("id"),lit(")"))).
show()
#+---------+---+--------+---------------+
#|    class| id|    name|          index|
#+---------+---+--------+---------------+
#|   System|  0|  System|   ST System(0)|
#|Generator|  1|Coal_Gen|ST Generator(1)|
#+---------+---+--------+---------------+

最新更新