我有这样的代码片段:
case class Purchase(cid: Int, pid: String, num: String)
val x = sc.parallelize(Array(
Purchase(123, "234", "1"),
Purchase(123, "247", "2"),
Purchase(189, "254", "3"),
Purchase(187, "299", "4")
))
// I have a dataframe structure: [cid: int, pid: string, num: string]
val df = sqlContext.createDataFrame(x)
// Defining a column name which I need to transform. Its value can change, like pid
val colName = "num"
// Defining a UDF. The definition of the UDF can change
val toIntUdf = udf((myString: String) => myString.toInt )
// This works
df.select( toIntUdf($"num") ).collect
我正在寻找一种避免使用"num"的方法。有什么想法吗?
如果您的意思是要使用colName
而不是使用文字$"num"
,方法如下:
import org.apache.spark.sql.functions._
df.select(toIntUdf(col(colName))).collect
您可以通过这种方式选择列。您可以在Spark的数据帧中找到更多文档
df.select(toIntUdf(df(colName)))
或者:
df.select(toIntUdf(df.col(colName)))