带有错误"Resolved attribute(s) missing"的 Pyspark 自加入

在执行pyspark dataframe自加入时，我收到了一个错误消息：

Py4JJavaError: An error occurred while calling o1595.join.
: org.apache.spark.sql.AnalysisException: Resolved attribute(s) un_val#5997 missing from day#290,item_listed#281,filename#286 in operator !Project [...]. Attribute(s) with the same name appear in the operation: un_val. Please check if the right attribute(s) are used.;;

这是一个简单的数据帧自加入下面的自加入，可以正常运行，但是在数据帧上进行了几次操作之后，例如添加列或与其他数据框一起加入上面提到的错误。

df.join(df,on='item_listed')

使用诸如Bellow等数据框架别名无法工作，并且会引起相同的错误消息：

df.alias('A').join(df.alias('B'), col('A.my_id') == col('B.my_id'))

我在这里找到了一个java解决方案，spark-14948，对于pyspark，我是这样的：

#Add a "_r" suffix to column names array
newcols = [c + '_r' for c in df.columns]
#clone the dataframe with columns renamed
df2 = df.toDF(*newcols)
#self-join
df.join(df2,df.my_column == df2.my_column_r)

相关内容

最新更新

热门标签：