bigquery中的多个内部联接会导致重复的行



Im正在尝试连接名为的bigquery中的6个表T0、T1、T2、T3、T4、T5我感兴趣的表格结果是T0和T1查询这些表后,我得到了43个匹配的

SELECT  
T1.F1, 
T0.F2, 
T0.F3, 
T0.F4, 
T1.F5,
T1.F6,
T1.F7,
T1.F8
T0.F9
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 on T1.F1= T0.F1
WHERE T0.F1 = "010001476713" 
AND T0.F2 = T1.F2
ORDER BY T0.F4

但是当我用多个INNER JOIN运行这个时,我得到了800个结果,而不是43个,结果是重复的

SELECT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1, 
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2, 
T3.F23,
T0.F3, 
T0.F4, 
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
AND T0.F2 = T1.F2
ORDER BY T0.F4

当我得到重复的行时,我会这样解决:

在表T0&T1。到目前为止还不错。

现在注释掉与表T2,T4,&T5(我在行的开头放了逗号,以便更容易地注释(,就像这个

SELECT
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1 
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2 
,T3.F23
,T0.F3 
,T0.F4 
,T1.F5
,T1.F6
,T1.F7
,T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
ORDER BY T0.F4

我已经在内部联接中将和T0.F2=T1.F2从where移动到on。当你运行这个查询时,你还会得到43行或更多行吗?如果更多的话,你需要弄清楚什么是双重匹配,并将其添加到你的on声明中,说明这确实是一种1-1的关系,或者如果你不想要多重匹配,也许可以将结果分组。你可能需要评论出你的选择语句,并选择所有语句来真正弄清楚,比如:

SELECT *
/*
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1 
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2 
,T3.F23
,T0.F3 
,T0.F4 
,T1.F5
,T1.F6
,T1.F7
,T1.F8
*/
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
ORDER BY T0.F4

一旦你弄清楚是哪些行导致了重复,你要么对结果进行分组,要么在on子句中添加一个"and"语句,使其变成1-1,然后继续。然后取消注释查询中与T2相关的部分,并做同样的事情,然后是T4和T5。如果你把上面的查询结果发给我,我可以帮你弄清楚你的on子句需要是什么,以防止它重复。

谢谢@jenstretman,我发现表4通过使用外键和非主键来复制匹配项,从而创建重复项,解决方案是使用DISTINCT只选择特定匹配的行。

SELECT DISTINCT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1, 
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2, 
T3.F23,
T0.F3, 
T0.F4, 
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN (SELECT DISTINCT T4.F3, T4.F21, T4.F22, FROM `TABLE4` T4)T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
AND T0.F2 = T1.F2
ORDER BY T0.F4

最新更新