我正试图将我当前的EMR从5.30.0升级到6.6.0甚至5.35.0来运行我的python批处理脚本。每当我运行我的python文件时,即使它是一个普通的打印语句,我都会在从5.35到6.6.0的任何版本的EMR上得到以下错误。有谁对这个问题有什么建议或想法吗?
Traceback (most recent call last):
File "/mnt1/yarn/usercache/hadoop/appcache/application_11111111_0001/container_16111111110_0001_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt1/yarn/usercache/hadoop/appcache/application_111111110_0001/container_16111111630_0001_02_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o92.sql.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:127)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:237)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:116)
看起来问题出在EMR中安装的应用程序中。没有在EMR上安装Hadoop和Hive,导致上面的错误。