如何在 pyspark 中以编程方式获取 YARN "Memory Total"和"VCores Total"指标



我一直在徘徊:https://docs.actian.com/vectorhadoop/5.0/index.html#page/User/YARN_Configuration_Settings.htm

但这些配置都不是我所需要的。

"yarn.nodemanager.resource.memory-mb"很有前途,但它似乎只针对节点管理器,只获取master的mem和cpu,而不是集群的。

int(hl.spark_context()._jsc.hadoopConfiguration().get('yarn.nodemanager.resource.memory-mb'))

您可以从Yarn History Server访问这些度量
URL:http://rm-http-address:port/ws/v1/cluster/metrics
指标:

totalMB
totalVirtualCores  

示例响应(也可以是XML(:

{  "clusterMetrics":   {
"appsSubmitted":0,
"appsCompleted":0,
"appsPending":0,
"appsRunning":0,
"appsFailed":0,
"appsKilled":0,
"reservedMB":0,
"availableMB":17408,
"allocatedMB":0,
"reservedVirtualCores":0,
"availableVirtualCores":7,
"allocatedVirtualCores":1,
"containersAllocated":0,
"containersReserved":0,
"containersPending":0,
"totalMB":17408,
"totalVirtualCores":8,
"totalNodes":1,
"lostNodes":0,
"unhealthyNodes":0,
"decommissioningNodes":0,
"decommissionedNodes":0,
"rebootedNodes":0,
"activeNodes":1,
"shutdownNodes":0   } }

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API

你所需要的只是在你的配置文件中找出你的Yarn历史服务器地址和端口检查,这对你没有帮助,因为我不知道你在哪里管理Yarn。

当你有URL时,用python:访问它

import requests
url = 'http://rm-http-address:port/ws/v1/cluster/metrics'
reponse = requests.get(url)
# Parse the reponse json/xml and get the relevant metrics... 

当然,在这个解决方案中不需要Hadoop或Spark上下文

相关内容

  • 没有找到相关文章

最新更新