"Correlated scalar subqueries must be Aggregated"是什么意思?



我用的是Spark 2.0。

我想执行以下SQL查询:

val sqlText = """
select
  f.ID as TID,
  f.BldgID as TBldgID,
  f.LeaseID as TLeaseID,
  f.Period as TPeriod,
  coalesce(
    (select
       f ChargeAmt
     from
       Fact_CMCharges f
     where
       f.BldgID = Fact_CMCharges.BldgID
     limit 1),
     0) as TChargeAmt1,
  f.ChargeAmt as TChargeAmt2,
  l.EFFDATE as TBreakDate
from
  Fact_CMCharges f
join
  CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE) 
where
  f.ActualProjected = 'Lease'
except(
  select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID) 
"""
val query = spark.sql(sqlText)
query.show()

似乎coalesce中的内部语句给出了以下错误:

pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1n+- LocalLimit 1n

这个查询有什么问题?

您必须确保您的子查询按定义(而不是按数据)只返回单行。否则Spark Analyzer在解析SQL语句时报错。

因此,当catalyst不能100%确定仅通过查看SQL语句(不查看您的数据)子查询只返回单行时,将抛出此异常。

如果您确定子查询只给出单行,则可以使用以下聚合标准函数之一,因此Spark Analyzer很高兴:

  • first
  • avg
  • max
  • min

相关内容

  • 没有找到相关文章