我有一个具有以下模式的表:
CREATE TABLE IF NOT EXISTS itemreviews(
itemid INT NOT NULL,
reviewid INT NOT NULL,
reviewrating INT NOT NULL,
reviewsummary VARCHAR(256) NOT NULL,
PRIMARY KEY(itemid, reviewid)
);
以下是该表中一些数据的示例:
+--------+----------+--------------+----------------------+
| itemId | reviewId | reviewRating | reviewSummary |
+--------+----------+--------------+----------------------+
| 01 | VM2LK | 5 | Nice item! Loved it. |
| 01 | P4KF2 | 3 | Meh, it's okay. |
| 02 | 3IK42 | 1 | Horrible, hated it. |
| 02 | KDXX2 | 3 | It's fine. |
| 03 | 3KXXZ | 4 | I liked it! |
| 03 | KQ23S | 3 | It's okay. |
+--------+----------+--------------+----------------------+
我的任务是编写一个查询,根据评分的平均值对项目进行排序。
因此,使用上面相同的数据集,该查询的结果将简单地为:
+--------+---------+
| itemid | average |
+--------+---------+
| 01 | 4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
| 03 | 3.5 | // item 03 had ratings 4 and 3 so (4 + 3) / number of ratings = 3.5
| 02 | 2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
+--------+---------+
我目前正在使用这个查询来实现这个结果:
SELECT itemid, AVG(reviewRating) as average FROM itemreviews
GROUP BY itemid
ORDER BY average DESC
然而,问题是我有另一个要求:
在结果中,我们应该只包括具有至少n
数量的评论的项目。
在上面的例子中,所有项目都有两个相同的评审。
如果,假设项目03只有一个评论,而我只想要至少有两个评论的项目,那么项目03应该被排除在外。只有项目01和02将被排序:
+--------+---------+
| itemid | average |
+--------+---------+
| 01 | 4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
| 02 | 2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
// item 03 is no more, since it only had one review
+--------+---------+
我不知道如何应用这个要求。
我猜我需要按itemid
分组,然后以某种方式使用COUNT
来查找一个项目的评论数量,然后过滤那些没有n
评论的项目,然后对其进行排序。
但我已经在计算平均值了,我想,这已经是";计数";内部。我也担心表现。
如果有人能给我什么建议,我将非常感谢。
由于您已经分组,您可以使用HAVING COUNT(*)
来获取的行数
SELECT itemid, AVG(reviewRating) as average FROM itemreviews
GROUP BY itemid
HAVING COUNT(*) > 1
ORDER BY average DESC