我有一个表a作为
ID1 ID2 ID3 VALUE
1 A xx 100
1 A yy 200
1 A yy 100
1 A 100
1 B zz 300
2 B yy 300
2 C yy 400
表B有如下操作或规则,它可以写在3个以上的ID字段上
ID1 ID2 ID3 ACTION
A xx GetAxx
A yy GetAyy
B GetB
我想加入表格并产生如下的结果
ID1 ID2 ID3 TOTAL
1 A xx 100
1 A yy 200
1 A yy 100
1 B zz 300
2 B yy 300
我尝试了简单的OR条件和条件联接,但是得到了相同的错误失败:出于安全原因,SemanticException笛卡尔产品被禁用。如果您知道自己在做什么,请将hive.strict.checks.cartsian.product设置为false,并且hive.mapred.mode未设置为"strict"以继续。
SELECT * FROM TABLE_A A
JOIN TABLE_B B
ON
(B.ID1 IS NULL OR A.ID1=B.ID1) AND
(B.ID2 IS NULL OR A.ID2=B.ID2) AND
(B.ID3 IS NULL OR A.ID3=B.ID3)
SELECT * FROM TABLE_A A
JOIN TABLE_B B
ON
(COALESCE(A.ID1,"NA")=COALESCE(B.ID1, A.ID1, "NA") and
(COALESCE(A.ID2,"NA")=COALESCE(B.ID2, A.ID2, "NA") and
(COALESCE(A.ID3,"NA")=COALESCE(B.ID3, A.ID3, "NA")
在忽略联接条件中的null比较的情况下,对多个字段进行有条件联接的正确方法是什么或当表B中存在null时,有没有办法强制条件等于true或有没有一种方法可以强制将表B中的null与表a中的任何值进行匹配,以返回真正的
非常感谢您的帮助!
您可以通过在列ID2、ID3上直接应用内部联接来获得结果。
SELECT A.ID1,A.ID2,A.ID3,A.VALUE as total FROM TABLE_A A
JOIN TABLE_B B
ON
A.ID2=B.ID2 and
A.ID3=B.ID3
您可以使用exists
:
select a.*
from a
where exists (select 1
from b
where (b.id1 = a.id1 or b.id1 is null) and
(b.id2 = a.id2 or b.id2 is null) and
(b.id3 = a.id3 or b.id3 is null)
);
实际上,您也可以使用JOIN
来执行此操作,但在某些情况下可能会导致重复的行:
select a.*
from a join
b
on (b.id1 = a.id1 or b.id1 is null) and
(b.id2 = a.id2 or b.id2 is null) and
(b.id3 = a.id3 or b.id3 is null);