Hadoop发现2个意外参数



我在windows上运行Hadoop,我试图提交一个MRJob,但它返回错误Found 2 unexpected arguments on the command line

(cmtle) d:>python norad_counts.py -r hadoop --hadoop-streaming-jar C:hadoop-3.3.0sharehadooptoolslibhadoop-streaming-3.3.0.jar all_files.txt
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in C:hadoop-3.3.0binbin...
Looking for hadoop binary in $PATH...
Found hadoop binary: C:hadoop-3.3.0binhadoop.CMD
Using Hadoop version 3.3.0
Creating temp directory C:UsersmilleAppDataLocalTempnorad_counts.mille.20210318.083636.028559
uploading working dir files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd...
Copying other local files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/
Running step 1 of 1...
Found 2 unexpected arguments on the command line [hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py, hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh]
Try -help for more information
Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['C:\hadoop-3.3.0\bin\hadoop.CMD', 'jar', 'C:\hadoop-3.3.0\share\hadoop\tools\lib\hadoop-streaming-3.3.0.jar', '-files', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/all_files.txt', '-output', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --mapper', '-combiner', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --combiner', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --reducer']' returned non-zero exit status 1.

以下是norad_count.py:的内容

from mrjob.job import MRJob, JSONProtocol
import pandas as pd
class MRNoradCounts(MRJob):

def mapper(self, _, file_path):
try:
df = pd.read_csv(file_path, compression='gzip', low_memory=False)
df = df[(df.MEAN_MOTION > 11.25) & (df.ECCENTRICITY < 0.25)]
except:
raise Exception(f'Failed to open {file_path}') 
#print(f'File: {file_path}')
for norad in df.NORAD_CAT_ID.to_list():
yield norad, 1

def combiner(self, norad, counts):
yield norad, sum(counts)

def reducer(self, norad, counts):
yield norad, sum(counts)

if __name__ == "__main__":
MRNoradCounts.run()

我通过重新安装Java JDK解决了问题。我最初将其安装到C:Program FilesJava,但根据其他一些说明将其移动到C:Java。我认为更新环境变量就足够了,但显然不是。所以我卸载了Java并重新安装了它。这次是C:Java,它解决了我的问题。