我的星火工作在aws电子病历集群上处于可接受模式很长一段时间

我的spark作业在aws EMR集群上长期处于可接受模式。以前我的星火工作在接受模式下停留的时间更少，现在它增加了。下面是我正在使用的一些配置，让我知道是否有任何配置需要调查。谢谢

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>64</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
<value>0</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>
<property>
<name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
<value>250</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>64</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.client.application-client-protocol.poll-interval-ms</name>
<value>200</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>
<property>
<name>yarn.timeline-service.client.retry-interval-ms</name>
<value>1000</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>
<property>
<name>yarn.timeline-service.client.best-effort</name>
<value>false</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>90.0</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
<value>1</value>
<final>false</final>
<source>mapred-default.xml</source>
</property>
<property>
<name>yarn.sharedcache.store.in-memory.check-period-mins</name>
<value>720</value>
<final>false</final>
<source>yarn-default.xml</source>
</property>

如果您的作业在接受状态下花费了很长时间，但这是一个很好的指标，表明您的作业没有可用的可用资源。

如果这是一个共享集群，请与管理员讨论如何获得更多资源或将其更好地分配给您，或者可能占用空间的内容。

如果这是您的集群。考虑一下为你的工作要求更少的内存，你可能会更快地被接受。你可能只是要求太多了，纱线很难找到分配给你的空间。(或者通过更大的集群。(过度分配驱动程序/执行器空间是一个非常常见的问题，所以试着用更少的空间运行，看看会发生什么。

请尝试检查资源管理器，看看还有哪些正在运行，可能会占用空间：http://master-public-dns-name:8088/

相关内容

最新更新

热门标签：