在蜂巢中的所有分区加入动态分区表中的所有分区

我的Hive表由2年的日期按日期进行分区，每个分区中都有200 2MB文件。

我能够连接命令命令运行" Alter Table Table_name分区(Partition_column_name ='2017-12-31'(condenate"

手动运行每个查询需要更多的时间，所以有什么简单的方法可以做到这一点？

option-1： Select and overwrite same hive table:

Hive支持插入覆盖同一表，如果您确定使用insert statements only插入蜂窝表中的数据(未加载通过HDFS (，然后使用此选项。<</

hive> SET hive.exec.dynamic.partition = true;
hive> SET hive.exec.dynamic.partition.mode = nonstrict;
hive> Insert overwrite table <partition_table_name> partition(<partition_col>) 
      select * from <db>.<partition_table_name>;

您还可以使用排序，通过，分发和这些附加参数来控制表中创建的文件数。

option-2 ： Using Shell script:

bash$ cat cnct.hql
alter table default.partitn1 partition(${hiveconf:var1} = '${hiveconf:var2}') concatenate

使用shell脚本(用于循环(

触发上述.hql脚本

bash$ cat trigg.sh
#!/bin/bash
id=`hive -e "show partitions default.partitn"`
echo "partitions: " $id
for f in $id; do
echo "select query for: " $f
#split the partitions on = then assigning to two variables
IFS="=" read var1 var2 <<< $f
#pass the variables and execute the cnct.hql script
hive --hiveconf var1=$var1 --hiveconf var2=$var2 -f cnct.hql
done

相关内容

最新更新

热门标签：