我在Databricks上使用Scala。假设我有一个如下的数据帧:
val df = Seq(
("Alex", 4.0, 3.2, 3.0),
("John", 2.0, 4.2, 1.2),
("Alice", 1.0, 5.0, 3.5),
("Mark", 3.0, 3.5, 0.5),
).toDF("Name", "Test A", "Test B", "Test C")
这给了我:
名称 | 测试A | 测试B | 测试C|
---|---|---|---|
Alex | 4.0 | 3.2 | 3.0 |
John | |||
Alice | 1.0 | 5.03.5 | |
标记 | 3.0 | 3.5 |
您可以通过DataFrame
访问map
,然后通过它们的位置访问Row
的元素:
import org.apache.spark.sql._
import spark.implicits._
val columnNames = Seq("Name", "Test A", "Test B", "Test C")
val df = Seq(
("Alex", 4.0, 3.2, 3.0),
("John", 2.0, 4.2, 1.2),
("Alice", 1.0, 5.0, 3.5),
("Mark", 3.0, 3.5, 0.5)
).toDF(columnNames: _*)
val output = df.map{
row => {
// Dividing the numbers by position
val division = row.getDouble(3) / row.getDouble(2)
// Creating a new row with an extra element: division
(row.getString(0), row.getDouble(1), row.getDouble(2), row.getDouble(3), division)
}
}.toDF(columnNames :+ "division": _*)
output.show
+-----+------+------+------+-------------------+
| Name|Test A|Test B|Test C| division|
+-----+------+------+------+-------------------+
| Alex| 4.0| 3.2| 3.0| 0.9375|
| John| 2.0| 4.2| 1.2| 0.2857142857142857|
|Alice| 1.0| 5.0| 3.5| 0.7|
| Mark| 3.0| 3.5| 0.5|0.14285714285714285|
+-----+------+------+------+-------------------+
希望这能有所帮助!