spark reduce and map issue



我正在Spark做一个小实验,我遇到了麻烦。

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]

# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
              .map(lambda x: (x,1))   <==== something wrong with this line maybe
              .reduce(sum))            <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

我找到了我的解决方案:

from operator import add
totalCount = (wordCounts
              .map(lambda x: x[1])
              .reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

我自己也不确定,但是从你的代码中我可以看到一些问题。'map'函数不能用于'list_name '之类的列表。Map(一些东西)',你需要像这样调用Map函数:'variable = Map (function, arguments)',如果你使用的是python 3,你需要做'variable = list(Map (function, arguments))'。

另一种类似的方法:您还可以将列表作为键、值对读取,并使用Distinct()

from operator import add
totalCount = (wordCounts
          .map(lambda (k,v)  : v )
          .reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)

最新更新