我正在删除实际的列名称,因为我不应该共享这些名称但是她是错误的一瞥
AnalysisException: u"Except can only be performed on tables with the compatible column types.
string <> boolean at the 28th column of the second table;
;n'Except falsen:- Filter (cast(inactive_date#111 as string) = '3001-01-01')n:
+- Project [... 33 more fields]n:+- Project [ ... 33 more fields]n:+- SubqueryAlias n:+-Relation[... 33 more fields] parquetn
+- Project [... 33 more fields]n +- Join Inner, (Key#275 = entry#26)n:- Filter (cast(inactive_date#283 as string) = '3001-01-01')n:
+- Project [... 33 more fields]n:
+- Project [... 33 more fields]n : +- SubqueryAlias +- Relation[,... 33 more fields] parquetn
+- Deduplicate [entry#26]n +- Project [entry#26]n+- Project [... 13 more fields]n
+- Project [... 13 more fields]n +- SubqueryAlias +- Relation[] parquetn"
我的代码看起来像
#old dataframe (consider it as History )
#daily dataframe ( Consider it as daily )
#Filtering the Active records based on condition
Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')
#Joining active old records with the matching active records in daily dataframe based on KeyColumnA
left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()
Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"])
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.**subtract**(Matching_Active_daily_old_dataframe)
注意:这里每日dataFrame 和旧数据框架具有完全相同的模式,但我得到了分析例外。有人可以在这方面提供帮助吗谢谢。
最后,我能够用以下代码解决此问题
#old dataframe (consider it as History )
#daily dataframe ( Consider it as daily )
cols = Active_old_filtered_records.columns
#Filtering the Active records based on condition
Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')
#Joining active old records with the matching active records in daily dataframe based on KeyColumnA
left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()
Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"]).select(cols)
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.subtract(Matching_Active_daily_old_dataframe)
除了开始位置以外,从任何地方连接两个数据框,更改了结果框架中的列的顺序。因此,维护一个COLS变量并按正确顺序选择了同一列,以确保所得的步骤正常工作:D
最后我能够解决这个问题。