我对此有点陌生,几乎没有经验,也很感谢您的帮助。我正在尝试在现有的Spark安装上安装Hive。
我主要遵循此页面中的说明,没有任何问题。
https://github.com/dryshliak/hadoop/wiki/installing-hive-on-existing-hadoop-cluster
我还创建了一个名为warehouse
的数据库,并创建了一个名为test_table
的表。
hive> show tables;
OK
employee
test_table
Time taken: 0.084 seconds, Fetched: 2 row(s)
hive> desc test_table;
OK
col1 int Integer Column
col2 string String Column
Time taken: 0.052 seconds, Fetched: 2 row(s)
hive>
我遇到的问题是,当我尝试使用命令
将数据插入test_table
时 hive> insert into test_table values(1,'aaa');
我收到以下错误消息
查询ID = hadoop_20190703135836_4b17eeac-249d-4E54-bd98-1212f3cb5b5b5b5ddd 总工作= 1
创建Spark客户端
启动工作1中的1个
为了更改还原器的平均负载(字节(:
设置hive.exec.reducers.bytes.per.reducer =< number>
为了限制最大还原数:
设置hive.exec.reducers.max =< number>
为了设置恒定数量的还原器:
设置mapReduce.job.Reduces =<数字>
无法执行SPARK任务,没有例外'org.apache.hadoop.hive.ql.metadata.hiveException(无法为Spark Session 821E05E7-74A8-4656-B4ED创建Spark客户端 失败:执行错误,从org.apache.hadoop.hive.ql.exec.spark.sparktask返回代码30041。无法为Spark Session 821E05E7-74A8-4656-B4ED-3A622C9CADCC
我正在使用以下SW版本
RHEL服务器版本7.5
Hadoop 3.1.1
火花2.4.0
蜂巢3.1.1
以下是从发生错误的hive.log
文件中剪切的。
2019-07-03T12:56:00,269 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Executing command(queryId=hadoop_20190703125557_f48a3966-691d-4c42-aee0-93f81fabef66): insert into test_table values(1,'aaa')
2019-07-03T12:56:00,270 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Query ID = hadoop_20190703125557_f48a3966-691d-4c42-aee0-93f81fabef66
2019-07-03T12:56:00,270 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Total jobs = 1
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Launching Job 1 out of 1
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Starting task [Stage-1:MAPRED] in serial mode
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: In order to change the average load for a reducer (in bytes):
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: set hive.exec.reducers.bytes.per.reducer=<number>
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: In order to limit the maximum number of reducers:
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: set hive.exec.reducers.max=<number>
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: In order to set a constant number of reducers:
2019-07-03T12:56:00,282 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: set mapreduce.job.reduces=<number>
2019-07-03T12:56:00,284 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] session.SparkSessionManagerImpl: Setting up the session manager.
2019-07-03T12:56:00,642 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] session.SparkSession: Trying to open Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f
2019-07-03T12:56:00,700 ERROR [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:263)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87)
... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more
2019-07-03T12:56:00,700 ERROR [6beaec32-ecac-4dc1-b118-f2c86c385005 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.1.jar:3.1.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at org.apache.hadoop.util.RunJar.run(RunJar.java:318) ~[hadoop-common-3.1.1.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:232) ~[hadoop-common-3.1.1.jar:?]
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:263) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.1.jar:3.1.1]
... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_191]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_191]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_191]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_191]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:263) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.1.jar:3.1.1]
... 24 more
2019-07-03T12:56:00,701 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
2019-07-03T12:56:00,701 ERROR [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session e3b4aa82-29a5-4e82-b63b-742c5d35df3f
2019-07-03T12:56:00,701 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Completed executing command(queryId=hadoop_20190703125557_f48a3966-691d-4c42-aee0-93f81fabef66); Time taken: 0.432 seconds
2019-07-03T12:56:00,701 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] ql.Driver: Concurrency mode is disabled, not creating a lock manager
2019-07-03T12:56:00,721 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] conf.HiveConf: Using the default value passed in for log id: 6beaec32-ecac-4dc1-b118-f2c86c385005
2019-07-03T12:56:00,721 INFO [6beaec32-ecac-4dc1-b118-f2c86c385005 main] session.SessionState: Resetting thread name to main
此答案有一个轻微的错误
hive_aux_jars_path取一个逗号分离的列表,而不是分离结肠。因此,正确的代码将为
export SPARK_HOME=/home/jp/bigdata/spark/spark-3.1.1-bin-hadoop3.2
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
if ! echo $jar | grep -q 'slf4j|mysql|datanucleus|^hive'; then
export SPARK_JARS=$SPARK_JARS,$SPARK_HOME/jars/$jar
fi
done
VAR=${SPARK_JARS#?};
export HIVE_AUX_JARS_PATH=$VAR
echo $HIVE_AUX_JARS_PATH
注意:在与Hive Jars冲突时,有些罐子被跳过了
像您一样,在Spark上部署Hive时,我遇到了同样的问题。最后,在我的研究之后,发现由于蜂巢无法加载火花罐,所以我对 hive-env.sh 进行了以下更改。
。添加hive-env.sh:
// Pay attention to your spark path
export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=$SPARK_JARS