如何更改仓库默认数据库的位置?


...
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
<description>location of default database for the warehouse</description>
</property>
...

代码是/user/spark3/conf/hive-site.xml的一部分

初始值为

hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7

我改变了

的值
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7

下面是代码和结果

println(spark.conf.get("spark.sql.warehouse.dir"))  //--Default : spark-warehouse
spark
.sql("""
SELECT 
website, 
avg(age) avg_age, 
max(id) max_id
FROM 
people a 
JOIN 
projects b 
ON a.name = b.manager 
WHERE 
a.age > 11 
GROUP BY 
b.website
""")
.write
.mode("overwrite")  //--Overwrite mode....
.saveAsTable("JoinedPeople")  //--saveAsTable(<warehouse_table_name>)....

sql("SELECT * FROM JoinedPeople").show(1000)
hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
|             website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...|   30.0|     2|
|http://kafka.apac...|   19.0|     3|
|http://storm.apac...|   30.0|     9|
+--------------------+-------+------+

spark.sql.warehouse价值"。把"kikang"改成了"skybluelee">

但是表的位置"JoinedPeople"不会改变。位置是'hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7' - hive-site.xml中的第一个值

我想更改默认数据库的位置。

如何更改默认位置?

我修改了'spark-defaults.conf',当然还关闭了&在ubuntu上。但无效

我找到我错过的了!

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://spark-worker-01:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>

是mysql_5.7中仓库的一部分当我首先尝试在metastore_db被创建时,所以即使我改变了位置,也不会改变metastore

您可以检查一下在这种情况下您的Spark版本是什么吗?根据Spark官方文档中的Hive Tables:

注意hive-site.xml中的hive.metastore.warehouse.dir属性自Spark 2.0.0以来已弃用。相反,使用spark.sql.warehouse.dir指定数据库在仓库中的默认位置。您可能需要为启动Spark应用程序的用户授予写权限。

  1. 是否更改hive-site.xml的属性为您工作(假设Spark版本高于2.0.0)?

    ...
    <property>
    <name>spark.sql.warehouse.dir</name>
    <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
    <description>location of default database for the warehouse</description>
    </property>
    ...
    
  2. 在初始化Spark会话之前设置属性对您有用吗?

    import org.apache.spark.sql.SparkSession
    val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"
    // Create a SparkSession with the desired warehouse location
    val spark = SparkSession
    .builder()
    .appName("Spark Hive Example")
    .config("spark.sql.warehouse.dir", warehouseLocation)
    .enableHiveSupport()
    .getOrCreate()
    // Import the necessary Spark functions and implicit
    import spark.implicits._
    import spark.sql
    sql("""
    SELECT 
    website, 
    avg(age) avg_age, 
    max(id) max_id
    FROM 
    people a 
    JOIN 
    projects b 
    ON a.name = b.manager 
    WHERE 
    a.age > 11 
    GROUP BY 
    b.website
    """)
    .write
    .mode("overwrite")
    .saveAsTable("JoinedPeople")
    // Retrieve the location of the "JoinedPeople" table from the Hive metastore
    val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
    println(s"Table location: $tableLocation")
    
    

相关内容

  • 没有找到相关文章

最新更新