不确定问题的措辞是否清晰(这是我能想到的最好的),但这里有一个例子来澄清事情。我有一个观点,Chats
,应该总结两个人之间的对话历史。视图由以下列组成:Sender
、Recipient
、Timestamp
、LatestMessage
和UnreadMessageCount
。
Chats
视图的列全部派生自一个表,Direct_Messages
表存储有关系统用户之间交换的各个聊天消息的详细信息。以下是其专栏:ID
、Sender
、Recipient
、Body
、Timestamp
、TimeRead
(如果收件人未读取邮件,则为 null)。视图的Timestamp
和LatestMessage
列具有两个参与者之间最新直接消息的值(由Timestamp
FWIW 提供的最新消息)。
问题实际上源于这样一个事实,即Sender
的Recipient
复合列应该存在于Chats
视图中,即两个参与者之间最新交换的视图。例如,如果 Gary 向 Barry 发送了一条"Hi"消息,那么 Barry 回复的是"Hello"——这两个人之间唯一的Chats
条目应该将Sender
作为"Barry",Recipient
为"Gary",Timestamp
作为 Barry 回复的时间戳,LatestMessage
为"Hello",UnreadMessageCount 作为Recipient
未读取的消息数。
我尝试使用GROUP BY "Sender", "Recipient" OR "Recipient", "Sender"
但它只返回两列:一列按 Barry, Gary 分组; 另一列按 Gary, Barry 分组
这是我的代码:
SELECT Sender AS Sender,
Recipient AS Recipient,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages
GROUP BY Sender, Recipient OR Recipient, Sender
ORDER BY Timestamp DESC
编辑:这是Direct_Messages
表中的示例数据和Chats
视图中的相应输出
从Direct_Messages
ID Sender Recipient Body Timestamp TimeRead
148567984 Gary Barry Hi 2018-12-12 23:53:39.487 2018-12-12 23:55:45
1668701120 Barry Gary Hello 2018-12-12 23:54:49.326 NULL
结果Chats
:
Sender Recipient Timestamp LatestMessage UnreadMessageCount
Gary Barry 2018-12-12 23:53:39.487 Hi 0
Barry Gary 2018-12-12 23:54:49.326 Hello 1
您可以通过使用具有多个参数的MIN()
和MAX()
来获得所需的大部分内容。 对于多个参数,这些是标量函数,其操作方式类似于其他数据库中的LEAST()
和GREATEST()
:
SELECT MIN(Sender, Recipient) AS u1,
MAX(Sender, Recipient) AS u2,
MAX(Timestamp) AS Timestamp,
-- Body AS LatestMessage,
(COUNT(*) - COUNT(TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY u1, u2
ORDER BY MAX(Timestamp) DESC
挑战在于获得最新方法。 您可以通过条件聚合和附加JOIN
来获得它:
SELECT MIN(dmc.Sender, dmc.Recipient) AS u1,
MAX(dmc.Sender, dmc.Recipient) AS u2,
MAX(dmc.Timestamp) AS Timestamp,
MAX(CASE WHEN dmc.Timestamp = dmc2.Timestamp THEN Body END) AS LatestMessage,
(COUNT(*) - COUNT(dmc.TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked dmc JOIN
(SELECT MIN(Sender, Recipient) AS u1,
MAX(Sender, Recipient) AS u2,
MAX(Timestamp) AS Timestamp
FROM Direct_Messages_cooked
GROUP BY u1, u2
) dmc2
ON dmc2.u1 = MIN(dmc.Sender, dmc.Recipient) AND
dmc2.u2 = MAX(dmc.Sender, dmc.Recipient)
GROUP BY u1, u2
ORDER BY dmc2.Timestamp DESC
您可以"预先存储"数据,以便始终从每个用户组合中向同一方向发送消息。
示例,如果您的数据是:
Sender Recipient
A ---> B
B ---> A
您可以将其更改为:
U1 U2
B ---> A (changed)
B ---> A
喜欢这个:
SELECT (case when Sender > Recipient then Sender else Recipient end) AS u1,
(case when Sender > Recipient then Recipient else Sender end) AS u2,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY
(case when Sender > Recipient then Sender else Recipient end),
(case when Sender > Recipient then Recipient else Sender end)
ORDER BY Timestamp DESC
注意:小心性能(我想这并不重要,因为你被标记为sqlite的问题)
您可以使用 CTE 预先存储数据并获得更具可读性的查询:
with Direct_Messages_coocked as
(
select
(case when Sender > Recipient then Sender else Recipient end) AS U1,
(case when Sender > Recipient then Recipient else Sender end) AS U2,
*
from Direct_Messages
)
SELECT U1 AS U1,
U2 AS U2,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_coocked
GROUP BY U1, U2
ORDER BY Timestamp DESC
基于@Gordon Linoff和@dani Herrera富有洞察力的答案,我设法调整并提出了一个简洁的解决方案来解决我的特定问题,尽管在我最初问题的更广泛背景下,从我的观察来看,@Gordon的答案似乎更全面地解决了这个问题。这是我设法想出的:
SELECT Sender AS Sender,
Recipient AS Recipient,
Timestamp AS Timestamp,
Body AS LatestMessage,
(COUNT( * ) - COUNT(TimeRead) ) AS UnreadMessageCount
FROM Direct_Messages
GROUP BY (
SELECT MAX(Sender, Recipient)
),
(
SELECT MIN(Sender, Recipient)
)
ORDER BY Timestamp DESC