学习火花:示例,哪些地方不起作用



我想从书中学习Spark执行示例。

在中使用column的形式表达式:

val fewFireDF = fireDF
.select("IncidentNumber", "AvailableDtTm", "CallType")
.where($"CallType" =!= "Medical Incident")

但是IntelliJ Idea不理解$"CallType"。它看起来像一个字符串。

这些变化很有效:

.where(col("CallType") =!= "Medical Incident")
.where("CallType != 'Medical Incident'")

我好像没有把我的问题解释清楚。

下面是我的代码:
package org.example.chapter3
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.dsl.expressions.{DslExpression, StringToAttributeConversionHelper}
import org.apache.spark.sql.types.{BooleanType, FloatType, IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.functions._
object DepartmentCalls extends App {
val spark = SparkSession
.builder
.appName("DepartmentCalls")
.getOrCreate()
if (args.length < 1) {
println("usage DepartmentCalls <file path to fire_incidents.csv")
System.exit(1)
}
val schema = StructType(
Array(
StructField("CallNumber", IntegerType),
StructField("UnitID", StringType),
StructField("IncidentNumber", IntegerType),
StructField("CallType", StringType),
StructField("CallDate", StringType),
StructField("WatchDate", StringType),
StructField("CallFinalDisposition", StringType),
StructField("AvailableDtTm", StringType),
StructField("Address", StringType),
StructField("City", StringType),
StructField("Zipcode", IntegerType),
StructField("Battalion", StringType),
StructField("StationArea", StringType),
StructField("Box", StringType),
StructField("OriginalPriority", StringType),
StructField("Priority", StringType),
StructField("FinalPriority", IntegerType),
StructField("ALSUnit", BooleanType),
StructField("CallTypeGroup", StringType),
StructField("NumAlarms", IntegerType),
StructField("UnitType", StringType),
StructField("UnitSequenceInCallDispatch", IntegerType),
StructField("FirePreventionDistrict", StringType),
StructField("SupervisorDistrict", StringType),
StructField("Neighborhood", StringType),
StructField("Location", StringType),
StructField("RowID", StringType),
StructField("Delay", FloatType)
)
)
// Read the file using the CSV DataFrameReader
val sfFireFile= args(0)
val fireDF = spark.read.schema(schema)
.option("header", "true")
.csv(sfFireFile)
println(fireDF.count())
val fewFireDF = fireDF
.select("IncidentNumber", "AvailableDtTm", "CallType")
.where($"CallType" =!= "Medical Incident")
fewFireDF.show(5, false)
}

我有以下错误:

  1. 无法解析重载方法'where'
  2. 类型不匹配。所需表达式,查找字符串-在"医疗事件">
  3. 之后

当我尝试编译我的代码时,我得到下一个错误:

(错误)/用户/xxxxxxx/工作/学习/火花/learning-spark/src/main/scala/org/example/chapter3/DepartmentCalls.scala: 62:28:类型不匹配;[错误]发现:字符串("医疗事故")[错误]org.apache.spark.sql.catalyst.expressions.Expression[错误]其中($"CallType"=!= "医疗事故")[错误]
^[错误]发现一个错误[错误](编译/编译增量)编译失败

您可能错过了调用站点范围内的导入。$<column name>的快捷方式通常通过调用import sparksession.implicits._来引入。如果你启用了"优化导入",Intellij通常会删除这个导入,因为它不会识别它正在使用。

最新更新