...
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...
代码是/user/spark3/conf/hive-site.xml的一部分
初始值为
hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7
我改变了
的值hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
下面是代码和结果
println(spark.conf.get("spark.sql.warehouse.dir")) //--Default : spark-warehouse
spark
.sql("""
SELECT
website,
avg(age) avg_age,
max(id) max_id
FROM
people a
JOIN
projects b
ON a.name = b.manager
WHERE
a.age > 11
GROUP BY
b.website
""")
.write
.mode("overwrite") //--Overwrite mode....
.saveAsTable("JoinedPeople") //--saveAsTable(<warehouse_table_name>)....
sql("SELECT * FROM JoinedPeople").show(1000)
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
| website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...| 30.0| 2|
|http://kafka.apac...| 19.0| 3|
|http://storm.apac...| 30.0| 9|
+--------------------+-------+------+
spark.sql.warehouse价值"。把"kikang"改成了"skybluelee">
但是表的位置"JoinedPeople"不会改变。位置是'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - hive-site.xml中的第一个值
我想更改默认数据库的位置。
如何更改默认位置?
我修改了'spark-defaults.conf',当然还关闭了&在ubuntu上。但无效
我找到我错过的了!
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://spark-worker-01:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
是mysql_5.7中仓库的一部分当我首先尝试在metastore_db被创建时,所以即使我改变了位置,也不会改变metastore
您可以检查一下在这种情况下您的Spark版本是什么吗?根据Spark官方文档中的Hive Tables:
注意hive-site.xml中的hive.metastore.warehouse.dir属性自Spark 2.0.0以来已弃用。相反,使用spark.sql.warehouse.dir指定数据库在仓库中的默认位置。您可能需要为启动Spark应用程序的用户授予写权限。
-
是否更改
hive-site.xml
的属性为您工作(假设Spark版本高于2.0.0)?... <property> <name>spark.sql.warehouse.dir</name> <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value> <description>location of default database for the warehouse</description> </property> ...
-
在初始化Spark会话之前设置属性对您有用吗?
import org.apache.spark.sql.SparkSession val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7" // Create a SparkSession with the desired warehouse location val spark = SparkSession .builder() .appName("Spark Hive Example") .config("spark.sql.warehouse.dir", warehouseLocation) .enableHiveSupport() .getOrCreate() // Import the necessary Spark functions and implicit import spark.implicits._ import spark.sql sql(""" SELECT website, avg(age) avg_age, max(id) max_id FROM people a JOIN projects b ON a.name = b.manager WHERE a.age > 11 GROUP BY b.website """) .write .mode("overwrite") .saveAsTable("JoinedPeople") // Retrieve the location of the "JoinedPeople" table from the Hive metastore val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0) println(s"Table location: $tableLocation")