数据帧到RDD的代码段不起作用



我正在尝试读取数据帧的每一行,并将行数据转换为自定义bean类。但这里的问题是,代码没有得到执行。为了检查,我已经编写了多个print语句,但df.rdd.map{row=>}中没有一个print语句被执行,就好像整个代码块被转义了一样。

代码片段:

print("data frame:", df.show()). 
df.rdd.map(row => {
// Debugging
println("Debugging")
if(row.isNullAt(0)) {
println("null data")
} else {
println(row.get(0).toString)
}
val employeeJobData = new EmployeeJobData
if(row.get(0).toString == null || row.get(0).toString.isEmpty){
employeeJobData.setEmployeeId("NULL_KEY_VALUE")
} else {
employeeJobData.setEmployeeId(row.get(0).toString)
}
employeeJobDataList.add(employeeJobData)
} )

df.show():的输出

|employee_id|employee_name|employee_email|paygroup|level|dept_id|
+-----------+-------------+--------------+--------+-----+-------+
|13         |         null|          null|    null| null|   null|
|14         |         null|          null|    null| null|   null|
|15         |         null|          null|    null| null|   null|
|16         |         null|          null|    null| null|   null|
|17         |         null|          null|    null| null|   null|
+-----------+-------------+--------------+--------+-----+-------+

您可以删除如下不必要的代码,并获得如下所示的java.util.List[EmployeeJobData]

import java.util
object MapToCaseClass {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess;
import spark.implicits._
val df  = List((12,"name","email@email.com","paygroup","level","dept_id")).toDF()
val employeeList : util.List[EmployeeJobData] = df
.map(row => {
val id = if (null == row.getString(0) || "null".equals(row.getString(0)) || row.getString(0).trim.isEmpty) {
"NULL_KEY_VALUE"
} else {
row.getString(0)
}
EmployeeJobData(id, row.getString(1), row.getString(2),
row.getString(3), row.getString(4), row.getString(5))
})
.collectAsList
}
}
case class EmployeeJobData(employee_id: String, employee_name: String,employee_email: String,paygroup: String,
level: String,dept_id: String)

只要将employee_iddept_id的数据类型(即,如果是数字(设置为Long,就可以进一步改进以上内容。对于employee_id可以避免这种"null".equals.isEmpty(),并且可以进一步减少代码。

最新更新