我需要执行以下查询:
SELECT
a.uuid,
b.uuid
FROM
(
SELECT DISTINCT
l.uuid AS east,
r.uuid AS west
FROM
tabl1 AS l
INNER JOIN tabl1 AS r
USING (attr_group)
WHERE l.uuid < r.uuid
) AS ids
INNER JOIN tabl2 AS a
ON (ids.east = a.uuid)
INNER JOIN tabl2 AS b
ON (ids.west = b.uuid)
问题是,当我有两个a.uuid和b.uuid时,代码会在uuuid上给出ambigus错误。但是,当我只尝试使用a.uuid或b.uuid时,查询会起作用。
另一个观察结果是,查询使用hive.execute(query(运行,但不使用hive.executeQuery(query(。我使用的是Zeppelin中的pyspark解释器。
HIVE版本为:3.1.0。
最终结果集中的两列具有相同的名称。只需将它们别名以消除歧义:
SELECT
a.uuid AS uuid_a, --> here
b.uuid AS uuid_b --> and here
FROM (
SELECT DISTINCT
l.uuid AS east,
r.uuid AS west
FROM tabl1 AS l
INNER JOIN tabl1 AS r USING (attr_group)
WHERE l.uuid < r.uuid
) AS ids
INNER JOIN tabl2 AS a ON ids.east = a.uuid
INNER JOIN tabl2 AS b ON ids.west = b.uuid
请注意,联接条件周围的括号是不必要的——我删除了它们。