https://georgheiler.com/2019/05/01/head-spark-spark-on-on-yarn/即:以下:
# download a current headless version of spark
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HADOOP_CONF_DIR=/usr/hdp/current/spark2-client/conf
export SPARK_HOME=<<path/to>>/spark-2.4.3-bin-without-hadoop/
<<path/to>>/spark-2.4.3-bin-without-hadoop/bin/spark-shell --master yarn --deploy-mode client --queue <<my_queue>> --conf spark.driver.extraJavaOptions='-Dhdp.version=2.6.<<version>>' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=2.6.<<version>>'
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.4.3
/_/
但是,a:
spark.sql("show databases").show
仅返回:
+------------+
|databaseName|
+------------+
| default|
+------------+
现在尝试通过原始的HDP配置(显然是由我的自定义版本读取的Spark(,例如:
一:
--files /usr/hdp/current/spark2-client/conf/hive-site.xml
两个:
--conf spark.hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf spark.hive.metastore.sasl.enabled='true' --conf hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf hive.metastore.sasl.enabled='true'
三:
--conf spark.yarn.dist.files='/usr/hdp/current/spark2-client/conf/hive-site.xml'
四:
--conf spark.sql.warehouse.dir='/apps/hive/warehouse'
所有这些都无助于解决这个问题。我如何才能得到火花以识别Hive数据库?
您可以复制位于/usr/hdp/hdp.version/hive/hive/conf >//opt/hdp/hdp/hdp.version中的hive-site.xml
/hive/conf ,取决于安装HDP的位置,输入无头火花安装的conf
目录。现在,当您重新启动火花壳时,应该选择此蜂巢配置并加载Apache Hive中存在的所有模式
蜂巢罐需要在Spark的类中,以启用Hive Support。如果Class Path中不存在Hive Jars,则目录实现 使用的是in-memory
在Spark-Shell中,我们可以通过执行
sc.getConf.get("spark.sql.catalogImplementation")
将提供in-memory
为什么需要蜂巢类
def enableHiveSupport(): Builder = synchronized {
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}
Sparksession.scala
private[spark] def hiveClassesArePresent: Boolean = {
try {
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}
如果不存在类,则未启用Hive支持。链接到上述检查作为Spark Shell初始化的一部分进行的代码。
在粘贴的上述代码中,SPARK_DIST_CLASSPATH
仅在Hadoop classPath和中填充了Hive Jars缺失的途径。