我有一个名为products的数据框架,如下所示:
Credit | Savings | Premium
1 0 1
0 1 1
1 1 0
所有列值都是String
我想把它转换成
Credit | Savings | Premium
Credit 0 Premium
0 Savings Premium
Credit Savings 0
火花?
我假设Credit , Savings , Premium
是字符串列
import org.apache.spark.sql.functions._ // for `when`
df : DataFrame = .....
df.replace("Credit", ImmutableMap.of("1", "Credit"))
.replace("Savings ", ImmutableMap.of("1", "Savings "))
.replace("Premium", ImmutableMap.of("1", "Premium"));
否则你也可以这样做…
df.withColumn("Credit", udf1)
.withColumn("Savings ", udf2)
.withColumn("Premium", udf3)
其中udf1, 2,3为spark udfs,用于将"1"转换为对应的列名…
而不是udf。您也可以使用when(cond, val).otherwise(val)
语法。
df.withColumn("Credit", when (df("Credit") === "1", lit("Credit")).otherwise(0)
.withColumn("Savings", when (df("Savings") === "1", lit("Savings ")).otherwise(0)
.withColumn("Premium", when (df("Premium") === "1", "Premium").otherwise(0)
就是这样. .祝你好运:-)