for
averageCount = (wordCountsDF
.groupBy().mean()).head()
我行(avg(数)= 1.6666666666666667)
但当我尝试:
averageCount = (wordCountsDF
.groupBy().mean()).head().getFloat(0)
我得到以下错误:
AttributeError: getFloat--------------------------------------------------------------------------- AttributeError 回溯(最近调用in () in ()1 #待办事项:用合适的代码替换----> 2 averageCount = (wordCountsDF .;3 . .mean groupBy () ()) .head () .getFloat (0)45 print averageCount
/databricks/spark/python/pyspark/sql/types.py in getattr
除了ValueError:-> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value):AttributeError: getFloat
我做错了什么?
我明白了。这将返回值:
averageCount = (wordCountsDF
.groupBy().mean()).head()[0]
这也可以:
averageCount = (wordCountsDF
.groupBy().mean('count').collect())[0][0]
print averageCount
数据框行继承自namedtuple(来自collections库),因此尽管您可以像上面那样对传统元组进行索引,但您可能希望通过其字段的名称访问它。毕竟,这就是命名元组的意义所在,而且它对于将来的更改也更加健壮。这样的:
averageCount = wordCountsDF.groupBy().mean().head()['avg(jobs)']