只有两个mesos框架支持GPU资源:Marathon和Aurora。我想在具有GPU资源的mesos代理上启动批处理作业。所以,只有奥罗拉支持这种工作。但是Aurora目前还没有得到dcos的官方支持。我试过融入,但没有成功。DCOS Mesos主机不注册Aurora框架,但参展商为Aurora创建记录。我没在mesos masters的日志里找到任何关于Aurora的记录。这是我的aurora-scheduler配置:
#!/bin/bash
GLOG_v=0
LIBPROCESS_PORT=8083
#LIBPROCESS_IP=127.0.0.1
JAVA_HOME=/opt/mesosphere/active/java/usr/java
JAVA_OPTS="-server -Djava.library.path='/opt/mesosphere/lib;/usr/lib;/usr/lib64'"
PATH=$PATH:/opt/mesosphere/bin
MESOS_NATIVE_JAVA_LIBRARY=/opt/mesosphere/lib/libmesos.so
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mesosphere/lib
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/mesosphere/lib
# Flags control the behavior of the Aurora scheduler.
# For a full list of available flags, run /usr/lib/aurora/bin/aurora-scheduler -help
AURORA_FLAGS=(
# The name of this cluster.
-cluster_name='My Cluster'
# The HTTP port upon which Aurora will listen.
-http_port=8088
# The ZooKeeper URL of the ZNode where the Mesos master has registered.
-mesos_master_address=zk://master_ip1:2181,master_ip2:2181,master_ip3:2181/mesos
# The ZooKeeper quorum to which Aurora will register itself.
-zk_endpoints=master_ip1:2181,master_ip1:2181,master_ip1:2181
# The ZooKeeper ZNode within the specified quorum to which Aurora will register its
# ServerSet, which keeps track of all live Aurora schedulers.
-serverset_path='/aurora/scheduler'
# Allows the scheduling of containers of the provided type.
-allowed_container_types='DOCKER,MESOS'
-allow_docker_parameters=true
-allow_gpu_resource=true
-executor_user=root
### Native Log Settings ###
# The native log serves as a replicated database which stores the state of the
# scheduler, allowing for multi-master operation.
# Size of the quorum of Aurora schedulers which possess a native log. If running in
# multi-master mode, consult the following document to determine appropriate values:
#
# https://aurora.apache.org/documentation/latest/deploying-aurora-scheduler/#replicated-log-configuration
-native_log_quorum_size=2
# The ZooKeeper ZNode to which Aurora will register the locations of its replicated log.
-native_log_zk_group_path='/aurora/replicated-log'
# The local directory in which an Aurora scheduler can find Aurora's replicated log.
-native_log_file_path='/var/lib/aurora/scheduler/db'
# The local directory in which Aurora schedulers will place state backups.
-backup_dir='/var/lib/aurora/scheduler/backups'
### Thermos Settings ###
# The local path of the Thermos executor binary.
-thermos_executor_path='/usr/bin/thermos_executor'
# Flags to pass to the Thermos executor.
-thermos_executor_flags='--announcer-ensemble 127.0.0.1:2181')
我已经在DC/OS 1.8上启动了Aurora框架。由于mesos和java嵌入到DS/OS中,并且有自定义配置,特别是路径,我必须将aurora与docker隔离。因此,您可以在我的docker仓库中找到Aurora组件的docker映像:Aurora scheduler, Aurora executor。这也允许我或其他人创建一个宇宙包。
在DC/OS上部署Aurora Scheduler的步骤:
-
在每个DC/OS代理上创建文件夹
/var/lib/aurora
-
在所有DC/OS代理上使用下一个JSON启动aurora执行器:
{ "id": "/aurora/aurora-executor", "env": { "MESOS_ROOT": "/var/lib/mesos/slave" }, "instances": 20, "cpus": 1, "mem": 128, "disk": 0, "gpus": 0, "constraints": [ [ "hostname", "UNIQUE" ] ], "container": { "docker": { "image": "krot/aurora-executor", "forcePullImage": true, "privileged": false, "network": "HOST" }, "type": "DOCKER", "volumes": [ { "containerPath": "/var/lib/mesos/slave", "hostPath": "/var/lib/mesos/slave", "mode": "RW" }, { "containerPath": "/var/lib/aurora", "hostPath": "/var/lib/aurora", "mode": "RW" } ] } }
。设置
"instances"
为座席数。2 a。aurora执行器部署的另一种方法(应该在每个DC/OS代理上完成):
sudo yum install -y python2 wget wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm rpm -Uhv --nodeps aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm
编辑添加
--mesos-root
标志,结果如下:grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos OBSERVER_ARGS=( --port=1338 --mesos-root=/var/lib/mesos/slave --log_to_disk=NONE --log_to_stderr=google:INFO )
-
使用下一个JSON启动aurora调度器(建议使用3个或更多实例来容错):
{ "id": "/aurora/aurora-scheduler", "env": { "CLUSTER_NAME": "YourCluster", "ZK_ENDPOINTS": "master.mesos:2181", "MESOS_MASTER": "zk://master.mesos:2181/mesos", "QUORUM_SIZE": "2", "EXTRA_SCHEDULER_ARGS": "-allow_gpu_resource=true" }, "instances": 3, "cpus": 1, "mem": 1024, "disk": 0, "gpus": 0, "constraints": [ [ "hostname", "UNIQUE" ] ], "container": { "docker": { "image": "krot/aurora-scheduler", "forcePullImage": true, "privileged": false, "network": "HOST" }, "type": "DOCKER", "volumes": [ { "containerPath": "/var/lib/aurora", "hostPath": "/var/lib/aurora", "mode": "RW" } ] } }
。
-allow_gpu_resource=true
使能GPU支持。Aurora调度器可以使用环境变量进行配置。详细信息请参考文档。