SQL如何在连接多个表时防止STRING_AGG中的重复



以下是我要讨论的数据库的关系模型(一个节目可以有多个演员和多种类型(:关系模型(抱歉,没有足够的声誉来插入图像(

我想实现的是了解一个特定节目的每一个细节+它的类型+在节目中扮演的演员及其身份证。

我试过了:

SELECT shows.*,
STRING_AGG(g.name, ', ')                AS genres,
STRING_AGG(CAST(a.id AS VARCHAR), ', ') AS actor_ids,
STRING_AGG(a.name, ', ')                AS actors
FROM shows
LEFT JOIN show_genres sg ON shows.id = sg.show_id
LEFT JOIN genres g ON g.id = sg.genre_id
LEFT JOIN show_characters sc ON shows.id = sc.show_id
LEFT JOIN actors a ON sc.actor_id = a.id
WHERE shows.id = 1390
GROUP BY shows.id

结果:

演员动作,动作,动作,冒险,冒险,冒险,436512、412933、9739、414839、436539、436541、11439、419217、415442、413734、432873、420752、421890、436592、448107、458178、150561、436590、412041、436511、436512、412、933、9769、414839436539,436541,11439、417217、415422、413734432873,420752,421890,436592 9739、414839、436539、436541、11439、419217、415442、413734、432873,420752、421890、436592、448107、458178、150561、436590、412041、436511、436512、412933、9739、414839、436539、436541、11439、419217、415442、413734、432873、420752,421890,436592,448107,458178、Iain Glen,Alfie Allen,Liam Cunningham、John Bradley、Conleth Hill、Aidan Gillen、Gwendoline Christie、Isaac Hempstead Wright、Kristofer Hivju、Nathalie Emmanuel、Jacob Anderson、Jerome Flynn、Rory McCann、Emilia Clarke、Kit Harington、Peter Dinklage、Lena Headey、Nikolaj Coster Waldau、Sophie Turner、Maisie Williams、Iain Glen、Alfie Allen、Liam Cunn,Gwendoline Christie、Isaac Hempstead Wright、Kristofer Hivju、Nathalie Emmanuel、Jacob Anderson、Jerome Flynn、Rory McCann、Emilia Clarke、Kit Harington、Peter Dinklage、Lena Headey、Nikolaj Coster Waldau、Sophie Turner、Maisie Williams、Iain Glen、Alfie Allen、Liam Cunningham、John Bradley、Conleth Hill、Aidan Gillen、Gwendoine Christie、,Nathalie Emmanuel、Jacob Anderson、Jerome Flynn、Rory McCann、Emilia Clarke、Kit Harington、Peter Dinklage、Lena Headey、Nikolaj Coster Waldau、Sophie Turner、Maisie Williams、Iain Glen、Alfie Allen、Liam Cunningham、John Bradley、Conleth Hill、Aidan Gillen、Gwendoline Christie、Isaac Hempstead Wright、Kristofer Hivju、Nathalie

在这种情况下,一种方法是首先对每个组成表(或表集(进行STRING_AGG;然后,LEFT将这些人为创建的表连接到主表上。这就避免了在连续的LEFT JOIN过程中可能出现的乘法问题。

在你的情况下,试试这样的东西:

SELECT
shows.*,
show_genre_names.genre_names,
show_actors.actor_ids,
show_actors.actor_names
FROM
shows
LEFT JOIN 
( -- one row per show_id
SELECT
sg.show_id,
STRING_AGG(g.name, ', ') AS genre_names
FROM
show_genres sg
JOIN genres g ON g.id = sg.genre_id
GROUP BY
sg.show_id
) show_genre_names
ON shows.id = show_genre_names.show_id 
LEFT JOIN
( -- one row per show_id
SELECT
sc.show_id,
STRING_AGG(a.id, ', ') AS actor_ids,
STRING_AGG(a.name, ', ') AS actor_names
FROM
show_characters sc
JOIN actors a ON a.id = sc.actor_id
GROUP BY
sc.show_id
) show_actors
ON shows.id = show_actors.show_id
WHERE
shows.id = 1390
;

您也可以通过其他方式解决此问题,但了解此技术将对您的SQL之旅有所帮助。

同时,我自己也能想出一个解决方案,只是为了更容易地管理稍后返回的数据,我将返回的数组转换为字典列表。

@connection_handler
def get_show_by_id(cursor: 'RealDictCursor', id: int) -> 'RealDictRow':
"""

Args:
cursor: a cursor which returns dictionaries (use @connection.connection_handler decorator)
id: number of items shows on a single page

Returns:
All show details in a RealDictRow + genres(as a concatenated string) + actors in a list
containing the actors ids and names in a dictionary

"""

# ONE QUERY SOLUTION:
# concatenating actors names and ids together to prevent issue when actors share the same name
# and worked on the same show
query = """
SELECT
shows.*,
STRING_AGG(DISTINCT g.name, ',')                AS genres,
ARRAY_AGG(DISTINCT ARRAY[a.id::VARCHAR, a.name]) AS actors_ids_with_names
FROM shows
LEFT JOIN show_genres sg ON shows.id = sg.show_id
LEFT JOIN genres g ON g.id = sg.genre_id
LEFT JOIN show_characters sc ON shows.id = sc.show_id
LEFT JOIN actors a ON sc.actor_id = a.id
WHERE shows.id=%s
GROUP BY shows.id
"""
val = (id,)
cursor.execute(query, val)
show = cursor.fetchone()
actor = [{"id":actor_id_with_name[0], "name": actor_id_with_name[1] } for actor_id_with_name in show["actors_ids_with_names"]]
show.pop("actors_ids_with_names")
show["actors"] = actor
return show

最新更新