将SQL中的表数据透视到具有最大唯一观察数的列



我有一个sql表:

+----+--------------+-----------+
| ID |    Genre     | Frequency |
+----+--------------+-----------+
|  1 | Rock         | 0.11      |
|  1 | Punk         | 0.22      |
|  1 | Glam         | 0.33      |
|  2 | Metal        | 0.44      |
|  2 | Heavy Metal  | 0.55      |
|  2 | Thrash Metal | 0.66      |
+----+--------------+-----------+

我知道如何将其转向:

+----+------+------+------+-------+-------------+--------------+
| ID | Rock | Punk | Glam | Metal | Heavy Metal | Thrash Metal |
+----+------+------+------+-------+-------------+--------------+
|  1 | 0.11 | 0.22 | 0.33 | NULL  | NULL        | NULL         |
|  2 | NULL | NULL | NULL | 0.44  | 0.55        | 0.66         |
+----+------+------+------+-------+-------------+--------------+

但是我想要这种格式的表格:

+----+---------+-------------+-------------+-------------+--------------+-------------+
| ID | genre 1 | frequency 1 |   genre 2   | frequency 2 |   genre 3    | frequency 3 |
+----+---------+-------------+-------------+-------------+--------------+-------------+
|  1 | Rock    | 0.11        | Punk        | 0.22        | Glam         | 0.33        |
|  2 | Metal   | 0.44        | Heavy Metal | 0.55        | Thrash Metal | 0.66        |
+----+---------+-------------+-------------+-------------+--------------+-------------+

这可能看起来很奇怪,因为在大多数数据科学案例中,我们希望对特定的列有一个"意义"。但在这种情况下,genre 1没有任何"意义"。但我想要这种格式的原因是,我有很多不同流派的艺术家。因此,如果我有1000种不同的类型,并且我以通常的方式进行调整——我最终会有1000个不同的专栏,每种类型一个。不过,其中许多将为Null。

但是,在我的转向方式中,即使我有1000种不同的流派,栏号也将是"流派数量最高的艺术家的流派数量"。所以,在我的例子中,如果我添加另一个04完全不同流派的艺术家,我只需要添加另一列(而不是04(。

此外,如果我可以按每个艺术家的频率进行排序,那么变量genre 1实际上就有意义了。它将意味着"最频繁的流派"。

请注意,我不允许在主数据库中创建任何新列。

它可以通过PIVOT完成,但并不漂亮。

WITH CTE AS
(
SELECT ID, Genre, CAST(Frequency AS VARCHAR(30)) AS  Frequency,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
FROM ArtistGenres
)
SELECT *
FROM
(
SELECT ID, CONCAT(ca.Name,' ',rn) AS Col, ca.Val
FROM CTE
CROSS APPLY (VALUES ('genre', Genre),('frequency', Frequency)) ca(Name, Val)      
) src
PIVOT
(
MAX(Val)
FOR Col IN ([genre 1], [frequency 1], [genre 2], [frequency 2], [genre 3], [frequency 3])
) pvt;

或者通过条件聚合:

SELECT ID,
MAX(CASE WHEN rn = 1 THEN Genre END) AS [genre 1],
MAX(CASE WHEN rn = 1 THEN Frequency END) AS [frequency 1],
MAX(CASE WHEN rn = 2 THEN Genre END) AS [genre 2],
MAX(CASE WHEN rn = 2 THEN Frequency END) AS [frequency 2],
MAX(CASE WHEN rn = 3 THEN Genre END) AS [genre 3],
MAX(CASE WHEN rn = 3 THEN Frequency END) AS [frequency 3]
FROM
(
SELECT ID, Genre, Frequency, 
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
FROM ArtistGenres
) q
GROUP BY ID
ORDER BY ID;

db<>小提琴这里

最新更新