这是我的联合代码:
val dfToSave=dfMainOutput.union(insertdf.select(dfMainOutput).withColumn("FFAction", when($"FFAction" === "O" || $"FFAction" === "I", lit("I|!|")))
当我进行联合时,我会出现以下错误:
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. string <> boolean at the 11th column of the second table;;
'Union
这是两个数据范围的架构:
insertdf.printSchema()
root
|-- OrganizationID: long (nullable = true)
|-- SourceID: integer (nullable = true)
|-- AuditorID: integer (nullable = true)
|-- AuditorOpinionCode: string (nullable = true)
|-- AuditorOpinionOnInternalControlCode: string (nullable = true)
|-- AuditorOpinionOnGoingConcernCode: string (nullable = true)
|-- IsPlayingAuditorRole: boolean (nullable = true)
|-- IsPlayingTaxAdvisorRole: boolean (nullable = true)
|-- AuditorEnumerationId: integer (nullable = true)
|-- AuditorOpinionId: integer (nullable = true)
|-- AuditorOpinionOnInternalControlsId: string (nullable = true)
|-- AuditorOpinionOnGoingConcernId: string (nullable = true)
|-- IsPlayingCSRAuditorRole: boolean (nullable = true)
|-- FFAction: string (nullable = true)
|-- DataPartition: string (nullable = true)
这是第二个dataFrame的架构:
dfMainOutput.printSchema()
root
|-- OrganizationID: long (nullable = true)
|-- SourceID: integer (nullable = true)
|-- AuditorID: integer (nullable = true)
|-- AuditorOpinionCode: string (nullable = true)
|-- AuditorOpinionOnInternalControlCode: string (nullable = true)
|-- AuditorOpinionOnGoingConcernCode: string (nullable = true)
|-- IsPlayingAuditorRole: boolean (nullable = true)
|-- IsPlayingTaxAdvisorRole: boolean (nullable = true)
|-- AuditorEnumerationId: integer (nullable = true)
|-- AuditorOpinionId: integer (nullable = true)
|-- AuditorOpinionOnInternalControlsId: integer (nullable = true)
|-- AuditorOpinionOnGoingConcernId: boolean (nullable = true)
|-- IsPlayingCSRAuditorRole: string (nullable = true)
|-- FFAction: string (nullable = true)
|-- DataPartition: string (nullable = true)
要避免此问题,我可能必须为每列编写一个select
。因此,是否有Scala语法设法键入种姓或将两个数据范围制作为同一类型?
这是我到目前为止尝试过的,但仍然遇到相同的错误:
val columns = dfMainOutput.columns.toSet.intersect(insertdf.columns.toSet).map(col).toSeq
//Perform Union
val dfToSave=dfMainOutput.select(columns: _*).union(insertdf.select(columns: _*)).withColumn("FFAction", when($"FFAction" === "O" || $"FFAction" === "I", lit("I|!|")))
每列的数据类型必须匹配才能执行数据框架。
看着您的模式,有三列不符合这一点:
AuditorOpinionOnInternalControlsId
AuditorOpinionOnGoingConcernId
IsPlayingCSRAuditorRole
更改数据类型的一种简单方法是使用withColumn
和cast
。我假设正确的类型在以下代码的dfMainOutput
数据框中:
val insertDfNew = insertdf
.withColumn("AuditorOpinionOnInternalControlsId", $"AuditorOpinionOnInternalControlsId".cast(IntegerType))
.withColumn("AuditorOpinionOnGoingConcernId", $"AuditorOpinionOnGoingConcernId".cast(BooleanType))
.withColumn("IsPlayingCSRAuditorRole", $"IsPlayingCSRAuditorRole".cast(StringType))
.withColumn("FFAction", when($"FFAction" === "O" || $"FFAction" === "I", lit("I|!|")).otherwise($"FFAction"))
val dfToSave = dfMainOutput.union(insertDfNew)