我试着写一个关于map-reduce的作业。我在终端中运行:
ioannis@ioannis-desktop:~$ python hw3.py
然后在另一个终端:
ioannis@ioannis-desktop:~$ ls
a2.py la.py~ stopwords.py
active_output LTP Crafting Quality Code stopwords.pyc
Desktop mincemeat.py Templates
Documents mincemeat.pyc test.py
Downloads Music test.py~
Dropbox NetBeansProjects test.pyc
examples.desktop NotFor Ubuntu One
Firefox_wallpaper.png Pictures Videos
hw3.py Public vmware
hw3.py~ __pycache__ Web Intelligence and Big Data
ioannis@ioannis-desktop:~$ python mincemeat.py -p changeme localhost
error: uncaptured python exception, closing channel <__main__.Client connected localhost:11235 at 0x27748c0>
(<type 'exceptions.NameError'>:global name 'allStopWords' is not defined
[/usr/lib/python2.7/asyncore.py|read|83]
[/usr/lib/python2.7/asyncore.py|handle_read_event|449]
[/usr/lib/python2.7/asynchat.py|handle_read|140]
[mincemeat.py|found_terminator|96]
[mincemeat.py|process_command|194]
[mincemeat.py|call_mapfn|170]
[hw3.py|mapfn|35])
ioannis@ioannis-desktop:~$
hw3.py:
import mincemeat
import glob
from stopwords import allStopWords
text_files = glob.glob('/home/ioannis/Web Intelligence and Big Data/Week 3: Load - I/hw3data/hw3data/*')
def file_contents(file_name):
f = open(file_name)
try:
return f.read()
except:
print "exception!!!!!!"
finally:
f.close()
source = dict((file_name, file_contents(file_name))
for file_name in text_files)
def mapfn(key, value):
for line in value.splitlines():
........................
........................
if word in allStopWords:
continue
print(word)
print(words_title)
print("nn")
def reducefn(k, vs):
result = sum(vs)
return result
s = mincemeat.Server()
s.datasource = source
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="changeme")
print results
为什么不工作?正如您所看到的,hw3.py和stopwords.py都在主目录中!
https://github.com/michaelfairley/mincemeatpy进口使用mincemeaty .py时一个潜在的问题:mapfn和reducefn函数不能访问它们的封闭环境,包括导入的模块。如果您需要在其中一个函数中使用导入模块,请确保在函数本身中包含import whatever。
ow:将from stopwords import allStopWords
语句移动到mapfn
函数的顶部。