为除主键列之外的所有 Spark 数据帧列添加前缀



下面是我为列名添加前缀的代码。我想排除一个或多个主键列。我的主键是一个字符串数组,可能包含 1 个或多个主键字段。

val primaryKeys = args(2).split("-")
val prefix = "w1."
val renamedColumns = df.columns.map(c=> df(c).as(s"$prefix$c"))
val dfNew = df.select(renamedColumns: _*)
val prefix2 = "w2."
val renamedColumns2 = df2.columns.map(c2=> df2(c2).as(s"$prefix2$c2"))
val df2New = df2.select(renamedColumns2: _*)
If it is just one column i was able to rename using withColumnRenamed but i am unable to do it if i have multiple primary columns. 

我无法做这样的事情

for (primaryKey <- primaryKeys) {
dfNew.withColumnRenamed("$PREFIX1"+s"${primaryKey}",s"$primaryKey").toDF()
}

有人可以帮忙吗?

如果我正确理解您的问题,您可以有条件地组合renamedColumns以仅作为非主键列的前缀,如下所示:

val df = Seq(
("1", "a", "c1", "d1"),
("2", "b", "c2", "d2"),
("3", "c", "c3", "d3")
).toDF("pk1", "pk2", "col1", "col2")
val primaryKeys = Array("pk1", "pk2")
val prefix = "w1."
val renamedColumns = df.columns.map(
c => if ( primaryKeys contains c ) df(c).as(c) else df(c).as(s"$prefix$c")
)
val dfNew = df.select(renamedColumns: _*)
dfNew.show
+---+---+-------+-------+
|pk1|pk2|w1.col1|w1.col2|
+---+---+-------+-------+
|  1|  a|     c1|     d1|
|  2|  b|     c2|     d2|
|  3|  c|     c3|     d3|
+---+---+-------+-------+

相关内容

  • 没有找到相关文章

最新更新