我正在使用Python编写一个简单的Hadoop程序。
mapper.py:
#!/usr/bin/python
import sys
import numpy
from collections import OrderedDict
for line in sys.stdin:
test = OrderedDict([('1', [11, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]), ('2', [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 4, 0, 0, 0, 0, 1, 0, 0, 0, 29, 28, 18, 12, 11, 11, 10, 9, 9, 9, 8, 8, 8, 6, 6, 6, 5, 5, 4, 4])])
for f in test:
print numpy.asarray(test[f])
reducer.py:
#!/usr/bin/python
import sys
for line in sys.stdin:
print line,
输入文件:
1
2
预期输出:
[11 5 5 5 4 4 4 3 3 3 3 3 3 3 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
[ 0 0 0 0 0 0 0 0 1 0 3 4 0 0 0 0 1 0 0 0 29 28 18 12 11 11 10 9 9 9 8 8 8 6 6 6 5 5 4 4]
[11 5 5 5 4 4 4 3 3 3 3 3 3 3 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 1 0 3 4 0 0 0 0 1 0 0 0 29 28 18 12 11 11 10 9 9 9 8 8 8 6 6 6 5 5 4 4]
实际输出:
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
11 10 9 9 9 8 8 8 6 6 6 5 5 4 4]
11 10 9 9 9 8 8 8 6 6 6 5 5 4 4]
[ 0 0 0 0 0 0 0 0 1 0 3 4 0 0 0 0 1 0 0 0 29 28 18 12 11
[ 0 0 0 0 0 0 0 0 1 0 3 4 0 0 0 0 1 0 0 0 29 28 18 12 11
[11 5 5 5 4 4 4 3 3 3 3 3 3 3 2 2 2 2 2 2 0 0 0 0 0
[11 5 5 5 4 4 4 3 3 3 3 3 3 3 2 2 2 2 2 2 0 0 0 0 0
输出作为
字符串排序,字符串包含括号。您可以通过格式化字符串来解决此问题,如下所示:
print ', '.join(str(item) for item in numpy.asarray(test[f]))
您可以阅读此问题和其他 SO 问题以获取更多详细信息。