如何按两列组合键的任一排列对行进行分组



不确定问题的措辞是否清晰(这是我能想到的最好的),但这里有一个例子来澄清事情。我有一个观点,Chats,应该总结两个人之间的对话历史。视图由以下列组成:SenderRecipientTimestampLatestMessageUnreadMessageCount

Chats视图的列全部派生自一个表,Direct_Messages表存储有关系统用户之间交换的各个聊天消息的详细信息。以下是其专栏:IDSenderRecipientBodyTimestampTimeRead(如果收件人未读取邮件,则为 null)。视图的TimestampLatestMessage列具有两个参与者之间最新直接消息的值(由TimestampFWIW 提供的最新消息)。

问题实际上源于这样一个事实,即SenderRecipient复合列应该存在于Chats视图中,即两个参与者之间最新交换的视图。例如,如果 Gary 向 Barry 发送了一条"Hi"消息,那么 Barry 回复的是"Hello"——这两个人之间唯一的Chats条目应该将Sender作为"Barry",Recipient为"Gary",Timestamp作为 Barry 回复的时间戳,LatestMessage为"Hello",UnreadMessageCount 作为Recipient未读取的消息数。

我尝试使用GROUP BY "Sender", "Recipient" OR "Recipient", "Sender"但它只返回两列:一列按 Barry, Gary 分组; 另一列按 Gary, Barry 分组

这是我的代码:

SELECT Sender AS Sender,
Recipient AS Recipient,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages
GROUP BY Sender, Recipient OR Recipient, Sender
ORDER BY Timestamp DESC

编辑:这是Direct_Messages表中的示例数据和Chats视图中的相应输出

Direct_Messages

ID          Sender  Recipient   Body    Timestamp                   TimeRead
148567984   Gary    Barry       Hi      2018-12-12 23:53:39.487     2018-12-12 23:55:45
1668701120  Barry   Gary        Hello   2018-12-12 23:54:49.326     NULL

结果Chats

Sender  Recipient   Timestamp                 LatestMessage UnreadMessageCount
Gary    Barry       2018-12-12 23:53:39.487   Hi            0
Barry   Gary        2018-12-12 23:54:49.326   Hello         1

您可以通过使用具有多个参数的MIN()MAX()来获得所需的大部分内容。 对于多个参数,这些是标量函数,其操作方式类似于其他数据库中的LEAST()GREATEST()

SELECT MIN(Sender, Recipient) AS u1,
MAX(Sender, Recipient) AS u2,
MAX(Timestamp) AS Timestamp,
-- Body AS LatestMessage,
(COUNT(*) - COUNT(TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY u1, u2
ORDER BY MAX(Timestamp) DESC

挑战在于获得最新方法。 您可以通过条件聚合和附加JOIN来获得它:

SELECT MIN(dmc.Sender, dmc.Recipient) AS u1,
MAX(dmc.Sender, dmc.Recipient) AS u2,
MAX(dmc.Timestamp) AS Timestamp,
MAX(CASE WHEN dmc.Timestamp = dmc2.Timestamp THEN Body END) AS LatestMessage,
(COUNT(*) - COUNT(dmc.TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked dmc JOIN
(SELECT MIN(Sender, Recipient) AS u1,
MAX(Sender, Recipient) AS u2,
MAX(Timestamp) AS Timestamp
FROM Direct_Messages_cooked
GROUP BY u1, u2
) dmc2
ON dmc2.u1 = MIN(dmc.Sender, dmc.Recipient) AND
dmc2.u2 = MAX(dmc.Sender, dmc.Recipient)
GROUP BY u1, u2
ORDER BY dmc2.Timestamp DESC

您可以"预先存储"数据,以便始终从每个用户组合中向同一方向发送消息。

示例,如果您的数据是:

Sender Recipient
A ---> B
B ---> A

您可以将其更改为:

U1     U2
B ---> A (changed)
B ---> A

喜欢这个:

SELECT (case when Sender > Recipient then Sender else Recipient end) AS u1,
(case when Sender > Recipient then Recipient else Sender end) AS u2,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY 
(case when Sender > Recipient then Sender else Recipient end), 
(case when Sender > Recipient then Recipient else Sender end) 
ORDER BY Timestamp DESC

注意:小心性能(我想这并不重要,因为你被标记为sqlite的问题)

您可以使用 CTE 预先存储数据并获得更具可读性的查询

with Direct_Messages_coocked as
(
select
(case when Sender > Recipient then Sender else Recipient end) AS U1,
(case when Sender > Recipient then Recipient else Sender end) AS U2,
*
from Direct_Messages
)
SELECT U1 AS U1,
U2 AS U2,
Timestamp AS Timestamp,
Body AS LatestMessage,
(SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_coocked
GROUP BY U1, U2
ORDER BY Timestamp DESC

基于@Gordon Linoff和@dani Herrera富有洞察力的答案,我设法调整并提出了一个简洁的解决方案来解决我的特定问题,尽管在我最初问题的更广泛背景下,从我的观察来看,@Gordon的答案似乎更全面地解决了这个问题。这是我设法想出的:

SELECT Sender AS Sender,
Recipient AS Recipient,
Timestamp AS Timestamp,
Body AS LatestMessage,
(COUNT( * ) - COUNT(TimeRead) ) AS UnreadMessageCount
FROM Direct_Messages
GROUP BY (
SELECT MAX(Sender, Recipient) 
),
(
SELECT MIN(Sender, Recipient) 
)
ORDER BY Timestamp DESC

最新更新