在Spark中使用各自的列名(有条件地)更改数据框架

我有一个名为products的数据框架，如下所示:

Credit | Savings | Premium
1        0         1
0        1         1
1        1         0

所有列值都是String

我想把它转换成

Credit | Savings | Premium
Credit   0         Premium
0        Savings   Premium
Credit   Savings   0

火花?

我假设Credit , Savings , Premium是字符串列

import org.apache.spark.sql.functions._ // for `when`
df : DataFrame = ..... 
df.replace("Credit", ImmutableMap.of("1", "Credit"))
.replace("Savings ", ImmutableMap.of("1", "Savings "))
.replace("Premium", ImmutableMap.of("1", "Premium"));

否则你也可以这样做…

df.withColumn("Credit", udf1)
.withColumn("Savings ", udf2)
.withColumn("Premium", udf3)

其中udf1, 2,3为spark udfs，用于将"1"转换为对应的列名…

而不是udf。您也可以使用when(cond, val).otherwise(val)语法。

 df.withColumn("Credit", when (df("Credit") === "1", lit("Credit")).otherwise(0)
 .withColumn("Savings", when (df("Savings") === "1", lit("Savings ")).otherwise(0)
.withColumn("Premium", when (df("Premium") === "1", "Premium").otherwise(0)

就是这样. .祝你好运:-)

相关内容

最新更新

热门标签：