选择配置单元中的下一个非空字段



我有一个包含 6 个字段的表格,如下所示:

Field1 Field2 Field3 Field4 Field5 Field6
ABC    45     XYZ           JKL    BNM
       65            QWE    JKL    
WER           YUI    IOP    GHJ

我想将数据从上面的表格中提取到一个包含 5 个字段的新表中,我们忽略了空值。我的最终表应如下所示:

Result1 Result2 Result3 Result4 Result5
ABC     45      XYZ     JKL     BNM
65      QWE     JKL
WER     YUI     IOP     GHJ

我已经开始使用 CASE WHEN 编写一个庞大的条件查询,但它失控并且容易出错。是否可以使用 Hive 中的regex_extract查询获取该表?

假设"空值"为空值

select  fields[0]   as Field1
       ,fields[1]   as Field2
       ,fields[2]   as Field3
       ,fields[3]   as Field4
       ,fields[4]   as Field5
from   (select  split(concat_ws(string(unhex(1)),*),'\x01') as fields
        from    mytable
        ) t

+--------+--------+--------+--------+--------+
| field1 | field2 | field3 | field4 | field5 |
+--------+--------+--------+--------+--------+
| ABC    | 45     | XYZ    | JKL    | BNM    |
| 65     | QWE    | JKL    | (null) | (null) |
| WER    | YUI    | IOP    | GHJ    | (null) |
+--------+--------+--------+--------+--------+

简化版本,假设逗号 ( , ( 不会出现在您的字段中:

select  ...
from   (select  split(concat_ws(',',*),',') as fields
        from    mytable
        ) t

假设"空值"是空字符串

select  fields[0]   as Field1
       ,fields[1]   as Field2
       ,fields[2]   as Field3
       ,fields[3]   as Field4
       ,fields[4]   as Field5
from   (select  split(regexp_replace(concat_ws(string(unhex(1)),*),'^\x01+|\x01+$|(\x01)+','$1'),'\x01') as fields
        from    mytable
        ) t

+--------+--------+--------+--------+--------+
| field1 | field2 | field3 | field4 | field5 |
+--------+--------+--------+--------+--------+
| ABC    | 45     | XYZ    | JKL    | BNM    |
| 65     | QWE    | JKL    | (null) | (null) |
| WER    | YUI    | IOP    | GHJ    | (null) |
+--------+--------+--------+--------+--------+

简化版本,假设逗号 ( , ( 不会出现在您的字段中:

select  ...
from   (select  split(regexp_replace(concat_ws(',',*),'^,+|,+$|(,)+','$1'),',') as fields
        from    mytable
        ) t

相关内容

最新更新