在azure synapse中调用POST livy批处理api时，Spark请求的核心比请求的多

我有一个azure突触火花簇，有3个节点，每个节点有4个vCore和32 GB内存。我正在尝试使用azure synapse Livy批处理API提交一个spark作业。请求看起来像这样，

curl --location --request POST 'https://<synapse-workspace>.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/<pool-name>/batches?detailed=true' `
--header 'cache-control: no-cache' `
--header 'Authorization: Bearer <Token>' `
--header 'Content-Type: application/json' `
--data-raw '{
"name": "T1",
"file": "folder/file.py",
"driverMemory": "1g",
"driverCores": 1,
"executorMemory": "1g",
"executorCores":1,
"numExecutors": 3
}'

我得到的回应是，

{
"TraceId": "<some-guid>",
"Message": "Your Spark job requested 16 vcores. However, the pool has a 12 core limit. Try reducing the numbers of vcores requested or increasing your pool size."
}

我不明白为什么它要求16核。难道它不应该要求4(3*1+1(个核心吗？

更新：我尝试将节点池大小更改为3个节点，每个节点有8个vCore和64 GB内存。而且，有了这种配置，

{
"name": "T1",
"file": "folder/file.py",
"driverMemory": "1g",
"driverCores": 1,
"executorMemory": "1g",
"executorCores": 1,
"numExecutors": 6
}

它要求28个核心(甚至是执行核心2、3、4(。如果我将executorCores更改为5,6,7或8，它将请求56个核心。

从门户网站没有办法做你想做的事情。

但您仍然可以通过指定驱动程序(核心和内存(和执行器(核心和存储器(来提交spark作业。例如：从Java 提交Azure Synapse中的Spark作业

使用上面的代码，我能够在3个节点的Medium实例中提交9个并发作业(有1个驱动程序和1个执行器，都使用一个核心((每个实例有8个核心，但只有7个可用，因为1个是为hadoop守护进程保留的(。

相关内容

最新更新

热门标签：