TypeError:在使用column()中使用多个列时，列是无法算是的

我试图从日期列找到季度开始日期。当我使用selectexpr（）

编写它时，我会得到预期的结果

df.selectExpr("add_months(history_effective_month,-(month(history_effective_month)%3)+1) as history_effective_qtr","history_effective_month").show(5)
output-
history_effective_qtr   history_effective_month
       2017-07-01                2017-06-01
       2016-04-01                2016-05-01
       2015-10-01                2015-09-01
       2012-01-01                2012-01-01
       2012-01-01                2012-01-01

但是，当我在.withcolumn（）中添加相同的逻辑时，我会得到TypeError：column不可能

df.withColumn("history_effective_quarter",add_months('history_effective_month',-(month('history_effective_month')%3)+1))
TypeError Traceback (most recent call last) 
<ipython-input-259-0bb78d27d2a7> in <module>() 1 
~/anaconda3/lib/python3.6/site-packages/pyspark/sql/column.py in iter(self) 248 249 def iter(self): --> 250 raise TypeError("Column is not iterable") 251 252 # string methods
TypeError: Column is not iterable

我使用的解决方法如下

df=selectExpr('*',"date_sub(history_effective_date," 
   "dayofmonth(history_effective_date)-1) as history_effective_month")

tl; dr 只需使用 select：

select(*cols)

投影一组表达式并返回新的数据框。

df.select(
   "history_effective_quarter", add_months('history_effective_month',
   -(month('history_effective_month')%3)+1))

您的代码无法工作，因为withColumn：

withColumn(colName, col)
通过添加列或更换具有相同名称的现有列来返回新的数据框。

用于添加单列

相关内容

最新更新

热门标签：