"libclntsh.so:无法在 ubuntu 中打开共享对象文件以在 Spark Cluster 中运行 python 程序



我有一个Python程序,它在本地运行时没有任何问题。但当我想在Spark集群中运行它时,我收到了关于libclntsh.so的错误,该集群有两个节点。

为了解释更多,要在集群中运行程序,首先我在spark-env.sh中设置主IP地址,如下所示:

export SPARK_MASTER_HOST=x.x.x.x

然后只需将从属节点的IP写入$SPARK_HOME/conf/worker。在那之后,我首先用这行运行Master:

/opt/spark/sbin/start-master.sh

然后运行从属:

/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077

接下来,我检查SPARK UI是否已启动。所以,我在主节点中运行程序,如下所示:

/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py  

当运行上述命令时,我收到以下错误:

File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
for k, v in iterator:
File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
df_group = connection.read_sql(query_cnt)
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
self.connect()
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
self.conn = cx_Oracle.connect(db_url)
cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: 
"libclntsh.so: cannot open shared object file: No such file or directory". 

我在~/.bashrc:中设置了这个环境变量

export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME:$PATH
export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:$JAVA_HOME/bin
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8

你能告诉我怎么了吗?

如有任何帮助,我们将不胜感激。

问题已解决。根据TroubleShooting链接,首先我在/etc/ld.so.conf.d/PATH中创建一个InstantClient.conf文件,并在其中写入InstantClient目录的路径。

# instant client Path
/usr/share/oracle/instantclient_19_8

最后,我运行这个命令:

sudo ldconfig

然后我运行spark-submit,它在InstantClient上运行时没有出现错误。

希望这对其他人有帮助。

最新更新