我正在尝试进行这种查询:
SELECT age,COUNT(age)
FROM T
GROUP BY age
HAVING age = MIN(SELECT COUNT(age) FROM T GROUP BY age)
ODER BY COUNT(age)
我试过
min_size = df.groupBy("age").count().select(f.min("count"))
df.groupBy("age").count().sort("count").filter(f.col("count")==min_size).show()
但我得到了AttributeError: 'DataFrame' object has no attribute '_get_object_id'
有什么方法可以在PySpark中使用子查询吗?
在您的情况下,min_size
是DataFrame,而不是某个整数
尝试将其收集为如下整数:
min_size = df.groupBy("age").count().select(f.min("count")).collect()[0][0]