我有一个3个节点设置,运行马拉松,Mesos-Master,Mesos-Slave和Zookeeper,并启用HA配置,然后测试使用Mesos-Execute的简单Hello App的部署预期。
现在一切都很好,所以我连接到马拉松并部署一个简单的应用程序来测试马拉松:( echo" hello">>/tmp/output.txt)但是应用程序被吸引了"等待"状态。
阻止马拉松使用Mesos资源进行部署的问题是什么?
Mesos-Master的日志:
I0904 11:23:27.064332 19769 master.cpp:2813] Received SUBSCRIBE call for framework 'marathon' at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:27.064623 19769 master.cpp:2890] Subscribing framework marathon with checkpointing enabled and capabilities [ PARTITION_AWARE ]
I0904 11:23:27.064669 19769 master.cpp:6272] Updating info for framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.064697 19769 master.cpp:2994] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 failed over
I0904 11:23:27.065032 19770 hierarchical.cpp:342] Activated framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.065465 19770 master.cpp:7305] Sending 3 offers to framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:27.907865 19769 http.cpp:1115] HTTP GET for /files/read?_=1504517007920&jsonp=jQuery17109098185077823333_1504516979864&length=50000&offset=352538&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
I0904 11:23:28.916651 19768 http.cpp:1115] HTTP GET for /files/read?_=1504517008930&jsonp=jQuery17109098185077823333_1504516979865&length=50000&offset=353797&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
E0904 11:23:30.071293 19775 process.cpp:2450] Failed to shutdown socket with fd 39, address 192.168.40.159:58072: Transport endpoint is not connected
I0904 11:23:30.073277 19768 master.cpp:1430] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 disconnected
I0904 11:23:30.073307 19768 master.cpp:3160] Deactivating framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:30.073485 19768 master.cpp:3137] Disconnecting framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:30.073496 19768 master.cpp:1445] Giving framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 1weeks to failover
I0904 11:23:30.073519 19768 hierarchical.cpp:374] Deactivated framework cb16118a-2257-4020-a907-63aa6294e11b-0000
curl -xget'http://mesosphere2:8098/v2/queue?pretty'|JQ
{
"queue": [
{
"count": 1,
"delay": {
"timeLeftSeconds": 0,
"overdue": true
},
"since": "2017-09-04T13:12:42.024Z",
"processedOffersSummary": {
"processedOffersCount": 12,
"unusedOffersCount": 12,
"lastUnusedOfferAt": "2017-09-04T13:14:52.554Z",
"rejectSummaryLastOffers": [
{
"reason": "UnfulfilledRole",
"declined": 3,
"processed": 3
},
{
"reason": "UnfulfilledConstraint",
"declined": 0,
"processed": 0
},
{
"reason": "NoCorrespondingReservationFound",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientCpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientMemory",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientDisk",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientGpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientPorts",
"declined": 0,
"processed": 0
}
],
"rejectSummaryLaunchAttempt": [
{
"reason": "UnfulfilledRole",
"declined": 12,
"processed": 12
},
{
"reason": "UnfulfilledConstraint",
"declined": 0,
"processed": 0
},
{
"reason": "NoCorrespondingReservationFound",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientCpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientMemory",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientDisk",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientGpus",
"declined": 0,
"processed": 0
},
{
"reason": "InsufficientPorts",
"declined": 0,
"processed": 0
}
]
},
"app": {
"id": "/test03",
"acceptedResourceRoles": [
"slave_public"
],
"backoffFactor": 1.15,
"backoffSeconds": 1,
"container": {
"type": "DOCKER",
"docker": {
"forcePullImage": false,
"image": "laghao/hello-marathon",
"network": "BRIDGE",
"parameters": [],
"portMappings": [
{
"containerPort": 80,
"hostPort": 80,
"labels": {},
"protocol": "tcp",
"servicePort": 10003
}
],
"privileged": false
},
"volumes": []
},
"cpus": 0.1,
"disk": 0,
"executor": "",
"instances": 1,
"labels": {},
"maxLaunchDelaySeconds": 3600,
"mem": 64,
"gpus": 0,
"portDefinitions": [
{
"port": 10003,
"name": "default",
"protocol": "tcp"
}
],
"requirePorts": false,
"upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
},
"version": "2017-09-04T13:12:41.993Z",
"versionInfo": {
"lastScalingAt": "2017-09-04T13:12:41.993Z",
"lastConfigChangeAt": "2017-09-04T13:12:41.993Z"
},
"killSelection": "YOUNGEST_FIRST",
"unreachableStrategy": {
"inactiveAfterSeconds": 300,
"expungeAfterSeconds": 600
}
}
}
]
}
来自文档
一个应用程序保留在"等待"中这意味着马拉松没有收到Mesos的"资源优惠",允许其启动此应用程序的任务。最简单的故障是,集群或其他框架中没有足够的资源来提供所有这些资源。您可以检查Mesos UI以获取可用资源。请注意,所需的资源(例如CPU,MEM,磁盘)必须在一个主机上全部可用。
如果您自己找不到解决方案,并且创建了一个GitHub问题,请将Mesos/状态端点的输出附加到错误报告中,以便我们可以检查可用的群集资源。
在您的情况下,应用程序角色要求和代理角色存在问题。您可以从UnfulfilledRole
推导它。
马拉松1.4引入了有关卡住部署的信息。您可以查询 /v2/queue
并获取统计信息为什么要拒绝要约。