我有一个Python程序,它在本地运行时没有任何问题。但当我想在Spark集群中运行它时,我收到了关于libclntsh.so的错误,该集群有两个节点。
为了解释更多,要在集群中运行程序,首先我在spark-env.sh中设置主IP地址,如下所示:
export SPARK_MASTER_HOST=x.x.x.x
然后只需将从属节点的IP写入$SPARK_HOME/conf/worker。在那之后,我首先用这行运行Master:
/opt/spark/sbin/start-master.sh
然后运行从属:
/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077
接下来,我检查SPARK UI是否已启动。所以,我在主节点中运行程序,如下所示:
/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py
当运行上述命令时,我收到以下错误:
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
for k, v in iterator:
File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
df_group = connection.read_sql(query_cnt)
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
self.connect()
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
self.conn = cx_Oracle.connect(db_url)
cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library:
"libclntsh.so: cannot open shared object file: No such file or directory".
我在~/.bashrc:中设置了这个环境变量
export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME:$PATH
export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:$JAVA_HOME/bin
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8
你能告诉我怎么了吗?
如有任何帮助,我们将不胜感激。
问题已解决。根据TroubleShooting链接,首先我在/etc/ld.so.conf.d/PATH中创建一个InstantClient.conf文件,并在其中写入InstantClient目录的路径。
# instant client Path
/usr/share/oracle/instantclient_19_8
最后,我运行这个命令:
sudo ldconfig
然后我运行spark-submit,它在InstantClient上运行时没有出现错误。
希望这对其他人有帮助。