我正试图通过设置属性'serialization.null.format' = ''
将源文件中的空白值转换为配置单元表中的NULL。我在蜂箱中写的查询是:
create table test(a int, b string) stored as parquet TBLPROPERTIES('serialization.null.format'='');
然后通过黑斑羚将值插入其中,类似这样:
insert overwrite table test values (1, ''), (2, 'b');
结果显示如下:
| a | b |
| 1 | |
| 2 | b |
有人能帮我解释一下为什么空白没有转换成NULL吗?
问题出在Parquet SerDe。请参阅https://issues.apache.org/jira/browse/HIVE-12362.
描述如下:
create table src (a string);
insert into table src values (NULL), (''), ('');
0: jdbc:hive2://localhost:10000/default> select * from src;
+-----------+--+
| src.a |
+-----------+--+
| NULL |
| |
| |
+-----------+--+
create table dest (a string) row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
alter table dest set SERDEPROPERTIES ('serialization.null.format' = '');
alter table dest set TBLPROPERTIES ('serialization.null.format' = '');
insert overwrite table dest select * from src;
0: jdbc:hive2://localhost:10000/default> select * from test11;
+-----------+--+
| test11.a |
+-----------+--+
| NULL |
| |
| |
+-----------+--+
您可以尝试使用以下语句插入到表中:
CASE
when TRIM(a) = ''
THEN NULL
ELSE a
END,
这样就可以了: nullif(trim(b),'')当为空时,将给出b或NULL值。因此,在选择语句时,您可以进行
从测试中选择a,nullif(trim(b),'');
FYR: nullif(值1,值2)如果值1=值2,则返回NULL;否则返回值1(从配置单元2.3.0开始)。简写为:CASE WHEN值1=值2,然后NULL else值1
https://www.docs4dev.com/docs/en/apache-hive/3.1.1/reference/LanguageManual_UDF.html
干杯!!