Apache Mesos slave无法连接到master



我一直在尝试用两台机器来设置apache mesos,一台作为从机,另一台作为主机和从机。我一直在使用中间层软件包。

主机(james pc)上的从机连接良好,但另一台机器上的从属机似乎没有连接。在下面记录消息。

这些是样本。时间戳可能不匹配。我已经运行了相同的命令,并阅读了很多og文件:(

谢谢!!

I1015 13:44:40.098458 16485 main.cpp:126] Build: 2014-09-23 05:36:09 by root
I1015 13:44:40.098520 16485 main.cpp:128] Version: 0.20.1
I1015 13:44:40.098530 16485 main.cpp:131] Git tag: 0.20.1
I1015 13:44:40.098537 16485 main.cpp:135] Git SHA: fe0a39112f3304283f970f1b08b322b1e970829d
I1015 13:44:40.098558 16485 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem
I1015 13:44:40.100411 16485 main.cpp:149] Starting Mesos slave
I1015 13:44:40.101066 16485 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
I1015 13:44:40.101238 16485 slave.cpp:278] Slave resources: cpus(*):4; mem(*):6649; disk(*):109050; ports(*):[31000-32000]
I1015 13:44:40.101335 16485 slave.cpp:306] Slave hostname: riri-desktop
I1015 13:44:40.101346 16485 slave.cpp:307] Slave checkpoint: true
I1015 13:44:40.102597 16489 state.cpp:33] Recovering state from '/tmp/mesos/meta'
I1015 13:44:40.102684 16489 state.cpp:62] Failed to find the latest slave from '/tmp/mesos/meta'
I1015 13:44:40.102777 16493 status_update_manager.cpp:193] Recovering status update manager
I1015 13:44:40.102821 16493 containerizer.cpp:252] Recovering containerizer
I1015 13:44:40.102982 16491 slave.cpp:3198] Finished recovery
I1015 13:44:40.103219 16488 slave.cpp:589] New master detected at master@10.117.149.130:5050
I1015 13:44:40.103313 16488 slave.cpp:625] No credentials provided. Attempting to register without authentication
I1015 13:44:40.103317 16491 status_update_manager.cpp:167] New master detected at master@10.117.149.130:5050
I1015 13:44:40.103333 16488 slave.cpp:636] Detecting new master
I1015 13:45:40.109150 16487 slave.cpp:3053] Current usage 27.72%. Max allowed age: 4.359784084743518days
I1015 13:46:40.119501 16489 slave.cpp:3053] Current usage 27.72%. Max allowed age: 4.359794862235926days

I1015 13:47:55.462615  5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:47:58.048534  5671 http.cpp:466] HTTP request for '/master/state.json'
I1015 13:48:01.461993  5667 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:01.464038  5670 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-36 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:01.464246  5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.464457  5669 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.466624  5667 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-37 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.466841  5671 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:08.064483  5673 http.cpp:466] HTTP request for '/master/state.json'
I1015 13:48:12.465992  5674 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:12.468195  5670 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-38 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:12.468408  5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000

james@james-pc:/var/log/meso$cat mesos slave。james-pc.invalid-user.log。信息。20141015-134946.6069

Log file created at: 2014/10/15 13:49:46
Running on machine: james-pc
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1015 13:49:46.323657  6069 logging.cpp:142] INFO level logging started!
I1015 13:49:46.323825  6069 main.cpp:126] Build: 2014-09-23 05:36:09 by root
I1015 13:49:46.323837  6069 main.cpp:128] Version: 0.20.1
I1015 13:49:46.323842  6069 main.cpp:131] Git tag: 0.20.1
I1015 13:49:46.323846  6069 main.cpp:135] Git SHA: fe0a39112f3304283f970f1b08b322b1e970829d
I1015 13:49:46.323860  6069 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem
I1015 13:49:46.324012  6069 main.cpp:149] Starting Mesos slave
I1015 13:49:46.324472  6084 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
I1015 13:49:46.324604  6084 slave.cpp:278] Slave resources: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]
I1015 13:49:46.324697  6084 slave.cpp:306] Slave hostname: james-pc.syd.local
I1015 13:49:46.324709  6084 slave.cpp:307] Slave checkpoint: true
I1015 13:49:46.326089  6079 state.cpp:33] Recovering state from '/tmp/mesos/meta'
I1015 13:49:46.326375  6084 status_update_manager.cpp:193] Recovering status update manager
I1015 13:49:46.326452  6079 containerizer.cpp:252] Recovering containerizer
I1015 13:49:46.326608  6083 slave.cpp:3198] Finished recovery
I1015 13:49:46.327335  6084 group.cpp:313] Group process (group(1)@127.0.1.1:5051) connected to ZooKeeper
I1015 13:49:46.327352  6084 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1015 13:49:46.327360  6084 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I1015 13:49:46.328199  6085 detector.cpp:138] Detected a new leader: (id='5')
I1015 13:49:46.328272  6085 group.cpp:658] Trying to get '/mesos/info_0000000005' in ZooKeeper
I1015 13:49:46.328738  6084 detector.cpp:426] A new leading master (UPID=master@127.0.1.1:5050) is detected
I1015 13:49:46.328806  6085 slave.cpp:589] New master detected at master@127.0.1.1:5050
I1015 13:49:46.328881  6085 slave.cpp:625] No credentials provided. Attempting to register without authentication
I1015 13:49:46.328886  6078 status_update_manager.cpp:167] New master detected at master@127.0.1.1:5050
I1015 13:49:46.328897  6085 slave.cpp:636] Detecting new master
I1015 13:49:46.662595  6085 slave.cpp:816] Re-registered with master master@127.0.1.1:5050
W1015 13:50:19.134799  6078 slave.cpp:791] Already registered with master master@127.0.1.1:5050
I1015 13:50:46.338639  6082 slave.cpp:3053] Current usage 59.91%. Max allowed age: 2.106364690479491days
W1015 13:51:07.704756  6082 slave.cpp:791] Already registered with master master@127.0.1.1:5050
W1015 13:51:15.611064  6078 slave.cpp:791] Already registered with master master@127.0.1.1:5050
W1015 13:51:18.703999  6082 slave.cpp:791] Already registered with master master@127.0.1.1:5050
W1015 13:51:21.911741  6079 slave.cpp:791] Already registered with master master@127.0.1.1:5050

您使用的是本地ip地址:

I1015 13:49:46.324472  6084 slave.cpp:167] Slave started on 1)@127.0.1.1:5051

尝试将它们设置为适当的ip,可能是它们无法正确地进行通信

几个地方可以看(我使用中间层谷歌部署):

从机(有些需要主IP,有些需要从机IP):

/etc/mesos-slave/hostname
/etc/mesos-slave/attributes/host
/etc/mesos/zk
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/mapred-site.xml

希望它能有所帮助!

问题实际上是您使用的是从设备的本地ip地址。修复它的最简单方法可能是首先检查NIC的外部ip地址(假设使用IPv4的eth0):

ifconfig eth0 | grep "inet addr"

会产生这样的

inet addr:10.110.1.123  Bcast:10.100.1.255  Mask:255.255.255.0

然后编辑文件/etc/default/mesos-slave并添加行

IP=10.110.1.123

最新更新