我有一个sql表:
+----+--------------+-----------+
| ID | Genre | Frequency |
+----+--------------+-----------+
| 1 | Rock | 0.11 |
| 1 | Punk | 0.22 |
| 1 | Glam | 0.33 |
| 2 | Metal | 0.44 |
| 2 | Heavy Metal | 0.55 |
| 2 | Thrash Metal | 0.66 |
+----+--------------+-----------+
我知道如何将其转向:
+----+------+------+------+-------+-------------+--------------+
| ID | Rock | Punk | Glam | Metal | Heavy Metal | Thrash Metal |
+----+------+------+------+-------+-------------+--------------+
| 1 | 0.11 | 0.22 | 0.33 | NULL | NULL | NULL |
| 2 | NULL | NULL | NULL | 0.44 | 0.55 | 0.66 |
+----+------+------+------+-------+-------------+--------------+
但是我想要这种格式的表格:
+----+---------+-------------+-------------+-------------+--------------+-------------+
| ID | genre 1 | frequency 1 | genre 2 | frequency 2 | genre 3 | frequency 3 |
+----+---------+-------------+-------------+-------------+--------------+-------------+
| 1 | Rock | 0.11 | Punk | 0.22 | Glam | 0.33 |
| 2 | Metal | 0.44 | Heavy Metal | 0.55 | Thrash Metal | 0.66 |
+----+---------+-------------+-------------+-------------+--------------+-------------+
这可能看起来很奇怪,因为在大多数数据科学案例中,我们希望对特定的列有一个"意义"。但在这种情况下,genre 1
没有任何"意义"。但我想要这种格式的原因是,我有很多不同流派的艺术家。因此,如果我有1000种不同的类型,并且我以通常的方式进行调整——我最终会有1000个不同的专栏,每种类型一个。不过,其中许多将为Null。
但是,在我的转向方式中,即使我有1000种不同的流派,栏号也将是"流派数量最高的艺术家的流派数量"。所以,在我的例子中,如果我添加另一个04完全不同流派的艺术家,我只需要添加另一列(而不是04(。
此外,如果我可以按每个艺术家的频率进行排序,那么变量genre 1
实际上就有意义了。它将意味着"最频繁的流派"。
请注意,我不允许在主数据库中创建任何新列。
它可以通过PIVOT完成,但并不漂亮。
WITH CTE AS
(
SELECT ID, Genre, CAST(Frequency AS VARCHAR(30)) AS Frequency,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
FROM ArtistGenres
)
SELECT *
FROM
(
SELECT ID, CONCAT(ca.Name,' ',rn) AS Col, ca.Val
FROM CTE
CROSS APPLY (VALUES ('genre', Genre),('frequency', Frequency)) ca(Name, Val)
) src
PIVOT
(
MAX(Val)
FOR Col IN ([genre 1], [frequency 1], [genre 2], [frequency 2], [genre 3], [frequency 3])
) pvt;
或者通过条件聚合:
SELECT ID,
MAX(CASE WHEN rn = 1 THEN Genre END) AS [genre 1],
MAX(CASE WHEN rn = 1 THEN Frequency END) AS [frequency 1],
MAX(CASE WHEN rn = 2 THEN Genre END) AS [genre 2],
MAX(CASE WHEN rn = 2 THEN Frequency END) AS [frequency 2],
MAX(CASE WHEN rn = 3 THEN Genre END) AS [genre 3],
MAX(CASE WHEN rn = 3 THEN Frequency END) AS [frequency 3]
FROM
(
SELECT ID, Genre, Frequency,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Frequency) AS rn
FROM ArtistGenres
) q
GROUP BY ID
ORDER BY ID;
db<>小提琴这里