hive是否有任何数组比较函数/udf来查看array=array
?
例如:
Select a.xyz, b.abc from a left join b on a.C=b.D
其中C
和D
为数组
数组是有序结构,
concat_ws( <separator> , <array> )
这个函数将使用分隔符将所有数组元素连接成一个字符串。
create table arrayDemo( id bigint, list array<String> );
create table dummy( a int );
insert into table dummy values ( 1 ) ;
insert into arraydemo select 1, array("Paperino", "Topolino") from dummy;
insert into arraydemo select 2, array("Pippo", "Pluto") from dummy;
insert into arraydemo select 1, array("Pippo", "Pluto") from dummy;
select * from arraydemo;
+---------------+--------------------------+--+
| arraydemo.id | arraydemo.list |
+---------------+--------------------------+--+
| 1 | ["Paperino","Topolino"] |
| 2 | ["Pippo","Pluto"] |
| 1 | ["Pippo","Pluto"] |
+---------------+--------------------------+--+
select *
from arraydemo as a1
inner join arraydemo as a2
on concat_ws( "|", a1.list ) = concat_ws("|", a2.list );
+--------+--------------------------+--------+--------------------------+--+
| a1.id | a1.list | a2.id | a2.list |
+--------+--------------------------+--------+--------------------------+--+
| 1 | ["Paperino","Topolino"] | 1 | ["Paperino","Topolino"] |
| 2 | ["Pippo","Pluto"] | 2 | ["Pippo","Pluto"] |
| 1 | ["Pippo","Pluto"] | 2 | ["Pippo","Pluto"] |
| 2 | ["Pippo","Pluto"] | 1 | ["Pippo","Pluto"] |
| 1 | ["Pippo","Pluto"] | 1 | ["Pippo","Pluto"] |
+--------+--------------------------+--------+--------------------------+--+
我在这个例子中以"toString()"的方式使用它。有时最好的方法是只比较数组的重要部分
select *
from arraydemo as a1
inner join arraydemo as a2
on a1.list[0] = a2.list[0];
可以通过
比较两个数组的值 array_diff($array1,$array2);
它返回一个数组,其中包含array1中不存在于array2或array3等中的项
您可以使用hive-third-functions。它提供了一些有用的json、数组和映射函数。对于这个问题,您可以这样使用:
Select a.xyz, b.abc from a left join b on array_equals(a.C,b.D)