随着AMI 3.3.0的发布,AWS支持Hue作为EMR中可安装的"应用程序",如Hive/Pig。使用EMR web UI,使用Hue创建集群对我来说很好,但当通过Boto添加Hue安装引导操作时,我会遇到一个不确定的错误(它会定期崩溃)。我已经用相同的配置测试了4次,崩溃率是50%。
在Boto中,我添加了一个额外的引导操作,这是在启用Hue时从web UI创建集群时自动完成的:
BootstrapAction('Install Hue', 's3://elasticmapreduce/libs/hue/install-hue', [])
集群然后以一个:终止
Terminated with errors: On the master instance (i-c6b7582a),
bootstrap action 2 returned a non-zero return code
在引导程序操作日志中:
Existing lock /var/run/yum.pid: another copy is running as pid 2007.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 22 M RSS (305 MB VSZ)
Started: Tue Nov 11 21:00:12 2014 - 00:19 ago
State : Sleeping, pid: 2007
Another app is currently holding the yum lock; waiting for it to exit...
成吨的,最后是一场大型的堆叠比赛:
Trying other mirror.
http://packages.ap-southeast-2.amazonaws.com/2014.09/main/20140901f63e/x86_64/repodata/repomd.xml?instance_id=i-c6b7582a®ion=us-east-1: [Errno 12] Timeout on http://packages.ap-southeast-2.amazonaws.com/2014.09/main/20140901f63e/x86_64/repodata/repomd.xml?instance_id=i-c6b7582a®ion=us-east-1: (28, 'Connection timed out after 10000 milliseconds')
Trying other mirror.
Traceback (most recent call last):
File "/usr/bin/yum", line 29, in <module>
yummain.user_main(sys.argv[1:], exit_code=True)
File "/usr/share/yum-cli/yummain.py", line 355, in user_main
errcode = main(args)
File "/usr/share/yum-cli/yummain.py", line 174, in main
result, resultmsgs = base.doCommands()
File "/usr/share/yum-cli/cli.py", line 572, in doCommands
return self.yum_cli_commands[self.basecmd].doCommand(self, self.basecmd, self.extcmds)
File "/usr/share/yum-cli/yumcommands.py", line 432, in doCommand
return base.installPkgs(extcmds, basecmd=basecmd)
File "/usr/share/yum-cli/cli.py", line 968, in installPkgs
txmbrs = self.install(pattern=arg)
File "/usr/lib/python2.6/site-packages/yum/__init__.py", line 4721, in install
mypkgs = self.pkgSack.returnPackages(patterns=pats,
File "/usr/lib/python2.6/site-packages/yum/__init__.py", line 1069, in <lambda>
pkgSack = property(fget=lambda self: self._getSacks(),
File "/usr/lib/python2.6/site-packages/yum/__init__.py", line 774, in _getSacks
self.repos.populateSack(which=repos)
File "/usr/lib/python2.6/site-packages/yum/repos.py", line 383, in populateSack
sack.populate(repo, mdtype, callback, cacheonly)
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 250, in populate
if self._check_db_version(repo, mydbtype):
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 342, in _check_db_version
return repo._check_db_version(mdtype)
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1520, in _check_db_version
repoXML = self.repoXML
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1706, in <lambda>
repoXML = property(fget=lambda self: self._getRepoXML(),
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1702, in _getRepoXML
self._loadRepoXML(text=self.ui_id)
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1693, in _loadRepoXML
return self._groupLoadRepoXML(text, self._mdpolicy2mdtypes())
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1667, in _groupLoadRepoXML
if self._commonLoadRepoXML(text):
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1495, in _commonLoadRepoXML
self._revertOldRepoXML()
File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1345, in _revertOldRepoXML
os.rename(old_data['old_local'], old_data['local'])
OSError: [Errno 2] No such file or directory
相比之下,引导日志显示了成功的单行:
Warning: RPMDB altered outside of yum.
在EMR AMI 3.3 中安装和运行Hue的示例
import boto.emr
from boto.emr.emrobject import InstanceGroup
from boto.emr.bootstrap_action import BootstrapAction
from boto.emr.step import ScriptRunnerStep
conn = boto.emr.EmrConnection()
jobid = conn.run_jobflow(name="Hue Example", ami_version = "3.3.0",
log_uri="s3n://your-log-path-here",
instance_groups= get_instance_groups(),
bootstrap_actions=get_bootstrap_actions(),
ec2_keyname="your-ec2-key-name",
steps = get_startup_steps()
)
def get_bootstrap_actions():
install_hue_action = BootstrapAction("Install Hue ",
"s3n://us-east-1.elasticmapreduce/libs/hue/install-hue",
bootstrap_action_args=None)
return [install_hue_action]
def get_startup_steps():
runHueStep = ScriptRunnerStep(name="Run Hue",
step_args = ["s3n://us-east-1.elasticmapreduce/libs/hue/run-hue"])
return [runHueStep]
def get_instance_groups():
#This is just an example. Actual implementation will have core, and task instance groups as well. Please choose your instance type, number, and bid price wisely as might it get too expensive too quickly.
spotInstanceGroup = InstanceGroup()
spotInstanceGroup.name="Spot Instance Group Master"
spotInstanceGroup.bidprice="0.20"
spotInstanceGroup.num_instances = 1
spotInstanceGroup.market="SPOT"
spotInstanceGroup.type="c3.2xlarge"
spotInstanceGroup.role="MASTER"