kubernetes上的Spark没有启动执行器,甚至没有尝试,为什么?



按照说明,我试图在Azure AKS免费层上部署我的pyspark应用程序与spark.executor.instances=5

spark-submit 
--master k8s://https://xxxxxxx-xxxxxxx.hcp.westeurope.azmk8s.io:443 
--deploy-mode cluster 
--name sparkbasics 
--conf spark.executor.instances=5 
--conf spark.kubernetes.container.image=aosb06.azurecr.io/sparkbasics:v300 
local:///opt/spark/work-dir/main.py

一切正常(包括应用程序本身),除了我没有看到任何执行器pod,只有驱动程序pod。

kubectl get pods                                
NAME                                  READY   STATUS      RESTARTS   AGE
sparkbasics-f374377b3c78ac68-driver   0/1     Completed   0          52m

Dockerfile来自Spark发行版

会有什么问题?资源分配有问题吗?

在驱动日志中似乎没有问题。

kubectl logs <driver-pod>
021-08-12 22:25:54,332 INFO spark.SparkContext: Running Spark version 3.1.2
2021-08-12 22:25:54,378 INFO resource.ResourceUtils: ==============================================================
2021-08-12 22:25:54,378 INFO resource.ResourceUtils: No custom resources configured for spark.driver.
2021-08-12 22:25:54,379 INFO resource.ResourceUtils: ==============================================================
2021-08-12 22:25:54,379 INFO spark.SparkContext: Submitted application: SimpleApp
2021-08-12 22:25:54,403 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2021-08-12 22:25:54,422 INFO resource.ResourceProfile: Limiting resource is cpu
2021-08-12 22:25:54,422 INFO resource.ResourceProfileManager: Added ResourceProfile id: 0
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing view acls to: 185,aovsyannikov
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing modify acls to: 185,aovsyannikov
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing view acls groups to: 
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing modify acls groups to: 
2021-08-12 22:25:54,475 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(185, aovsyannikov); groups with view permissions: Set(); users  with modify permissions: Set(185, aovsyannikov); groups with modify permissions: Set()
2021-08-12 22:25:54,717 INFO util.Utils: Successfully started service 'sparkDriver' on port 7078.
2021-08-12 22:25:54,781 INFO spark.SparkEnv: Registering MapOutputTracker
2021-08-12 22:25:54,818 INFO spark.SparkEnv: Registering BlockManagerMaster
2021-08-12 22:25:54,843 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2021-08-12 22:25:54,844 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
2021-08-12 22:25:54,848 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
2021-08-12 22:25:54,862 INFO storage.DiskBlockManager: Created local directory at /var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404/blockmgr-c51b9095-5426-4a00-b17a-461de2b80357
2021-08-12 22:25:54,892 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MiB
2021-08-12 22:25:54,909 INFO spark.SparkEnv: Registering OutputCommitCoordinator
2021-08-12 22:25:55,023 INFO util.log: Logging initialized @3324ms to org.sparkproject.jetty.util.log.Slf4jLog
2021-08-12 22:25:55,114 INFO server.Server: jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_275-b01
2021-08-12 22:25:55,139 INFO server.Server: Started @3442ms
2021-08-12 22:25:55,184 INFO server.AbstractConnector: Started ServerConnector@59b3b32{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2021-08-12 22:25:55,184 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
kubectl describe pod <driver-pod>
Name:         sparkbasics-f374377b3c78ac68-driver
Namespace:    default
Priority:     0
Node:         aks-default-31057657-vmss000000/10.240.0.4
Start Time:   Fri, 13 Aug 2021 01:25:47 +0300
Labels:       spark-app-selector=spark-256cc7f64af9451b89e0098397980974
spark-role=driver
Annotations:  <none>
Status:       Succeeded
IP:           10.244.0.28
IPs:
IP:  10.244.0.28
Containers:
spark-kubernetes-driver:
Container ID:  containerd://b572a4056014cd4b0520b808d64d766254d30c44ba12fc98717aee3b4814f17d
Image:         aosb06.azurecr.io/sparkbasics:v300
Image ID:      aosb06.azurecr.io/sparkbasics@sha256:965393784488025fffc7513edcb4a62333ba59a5ee3076346fd8d335e1715213
Ports:         7078/TCP, 7079/TCP, 4040/TCP
Host Ports:    0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
org.apache.spark.deploy.PythonRunner
local:///opt/spark/work-dir/main.py
State:          Terminated
Reason:       Completed
Exit Code:    0
Started:      Fri, 13 Aug 2021 01:25:51 +0300
Finished:     Fri, 13 Aug 2021 01:56:40 +0300
Ready:          False
Restart Count:  0
Limits:
memory:  1433Mi
Requests:
cpu:     1
memory:  1433Mi
Environment:
SPARK_USER:                 aovsyannikov
SPARK_APPLICATION_ID:       spark-256cc7f64af9451b89e0098397980974
SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
SB_KEY_STORAGE:             <set to the key 'STORAGE' in secret 'sparkbasics'>     Optional: false
SB_KEY_OPENCAGE:            <set to the key 'OPENCAGE' in secret 'sparkbasics'>    Optional: false
SB_KEY_STORAGEOUT:          <set to the key 'STORAGEOUT' in secret 'sparkbasics'>  Optional: false
SPARK_LOCAL_DIRS:           /var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404
SPARK_CONF_DIR:             /opt/spark/conf
Mounts:
/opt/spark/conf from spark-conf-volume-driver (rw)
/var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404 from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wlqjt (ro)
Conditions:
Type              Status
Initialized       True 
Ready             False 
ContainersReady   False 
PodScheduled      True 
Volumes:
spark-local-dir-1:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:     
SizeLimit:  <unset>
spark-conf-volume-driver:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      spark-drv-6f83b17b3c78af1f-conf-map
Optional:  false
default-token-wlqjt:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-wlqjt
Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

我发现pyspark应用程序本身有一个错误。

...
SparkSession.builder.master("local")
...

应该没有master

...
SparkSession.builder
...

就这么简单:(

相关内容

  • 没有找到相关文章

最新更新