使用group by函数根据现有列获取二进制列的SQL查询



我试图使用MySQL组基于两列,选择最近的日期,并创建两个二进制列记录状态。下面是一个示例表

_______________________________________________________________________
Letters | Numbers|         dates        |   score    |  random_status |
_______________________________________________________________________
A       |   2    | 2021-09-29 0:00:00   |    0.3     |   Sent         |
A       |   2    | 2021-10-01 0:00:00   |    1.4     |   Received     |
A       |   5    | 2021-10-04 0:00:00   |    0.8     |   Sent         |
A       |   7    | 2021-10-20 0:00:00   |    0.9     |   Sent         |
A       |   7    | 2021-10-20 0:20:00   |    0.5     |   Sent         |
R       |   7    | 2021-09-09 0:20:54   |    0.2     |   Sent         |
R       |   7    | 2021-10-14 0:00:00   |    2.5     |   Received     |
R       |   2    | 2021-10-07 0:00:00   |    0.7     |   Received     |
R       |   2    | 2021-09-14 0:00:00   |    1.7     |   Sent         |
C       |   5    | 2021-10-07 0:00:00   |    2.1     |   Sent         |
C       |   5    | 2021-10-25 0:00:00   |    3.5     |   Sent         |
C       |   7    | 2021-08-18 0:00:00   |    1.9     |   Sent         |
C       |   7    | 2021-08-29 0:00:00   |    0.6     |   Received     | 
C       |   2    | 2021-02-01 0:00:00   |    1.8     |   Sent         |

我想分组基于字母和数字列,我想要最新的日期,与最新的分数。我想基于状态列创建两个新列,该列表示字母和数字组合是否处于发送或接收状态。

看起来像这样:

Letters|    Numbers |   latest_date     |latest_score|  Has_sent| Has_received|
A      |       2    |2021-10-01 0:00:00 |     1.4    |     1    |      1      |
A      |       5    |2021-10-04 0:00:00 |     0.8    |     1    |      0      |
A      |       7    |2021-10-20 0:20:00 |     0.5    |     1    |      0      |
C      |       2    |2021-02-01 0:00:00 |     1.8    |     1    |      0      |
C      |       5    |2021-10-25 0:00:00 |     3.5    |     1    |      0      |
C      |       7    |2021-08-29 0:00:00 |     0.6    |     1    |      1      |
R      |       2    |2021-10-07 0:00:00 |     0.7    |     1    |      1      |
R      |       7    |2021-10-14 0:00:00 |     2.5    |     1    |      1      |

我使用了以下查询

SELECT t1.Letters, t1.Numbers, MAX(t1.dates) as latest_date, t1.score as latest_score,
case when status = "Sent" then 1 else 0 end AS Has_sent,
case when status = "Received" then 1 else 0 end AS Has_received
FROM dummy_data t1
WHERE
t1.dates IN (SELECT MAX(t2.dates) FROM dummy_data t2
WHERE t1.Letters = t2.Letters AND t1.Numbers = t2.Numbers)
GROUP BY t1.Letters, t1.Numbers;

最后两列has_sent和has_reporting没有按预期显示。相反,我得到它基于最大日期。根据每个字母和数字组合存在的状态是否可以将其变为二进制?

尝试:

select tbl1.Letters,
tbl1.Numbers,
tbl1.latest_date,
tbl1.latest_score score, 
tbl2.Has_sent,
tbl2.Has_received
from    (
select Letters, 
Numbers, 
max(dates) as `latest_date`, 
score as `latest_score`
from dummy_data
where dates in ( select max(dates) 
from dummy_data 
group by Letters, Numbers )
group by Letters, Numbers
) as tbl1 
inner join 

(
select Letters, 
Numbers ,
max(case when random_status = "Sent" then 1 else 0 end) AS Has_sent,
max(case when random_status = "Received" then 1 else 0 end) AS Has_received
from dummy_data
group by Letters, Numbers

) as tbl2 on tbl1.Letters=tbl2.Letters and tbl1.Numbers=tbl2.Numbers;

演示:https://www.db-fiddle.com/f/usu3XK7Gn8gGqQnusmCiLk/4

我从来没有使用过case,但是从我读到的它似乎是按行排列的?在这种情况下,它将不能同时作为聚合组。

SELECT 
t1.Letters, 
t1.Numbers, 
MAX(t1.dates)        AS latest_date, 
LAST_VALUE(t1.score) AS latest_score, /* Since your rows appear to be chronological I assume this works */
SELECT EXISTS(
SELECT 1 
FROM dummy_data 
WHERE 
Letters       = t1.Letters AND 
random_status = "Sent"
) AS has_sent,
SELECT EXISTS(
SELECT 1 
FROM dummy_data 
WHERE 
Letters       = t1.Letters AND 
random_status = "Received"
) AS has_received
FROM dummy_data AS t1
GROUP BY t1.Letters, t1.Numbers
;

如果您的表和许多时间序列一样大,那么这些子查询将不可持续。从您的示例talbes中看到的业务逻辑:

  1. 字母/数字对必须有发送状态
  2. 字母/数字对只有具有历史发送状态才能具有接收状态。
  3. 一个字母/数字对只能有一个发送行和一个接收行。(对字母、数字、随机状态的唯一约束)

这里有一个假设这些点的替代方案。


SELECT 
d_sent.Letters AS Letters,
d_sent.Numbers AS Numbers,
MAX(d_sent.dates, d_rcvd.dates) AS latest_date,
(
CASE
WHEN isnull(d_rcvd.score) THEN d_sent.score
ELSE d_rcvd.score
END
) AS latest_score,
1 AS is_sent,
cast(isnull(d_rcvd.score) AS SIGNED INTEGER) AS is_received
FROM (
SELECT * FROM dummy_data WHERE random_status="Sent"
) AS d_sent
LEFT JOIN (
SELECT * FROM dummy_data WHERE random_status="Received"
) AS d_rcvd
ON
d_sent.Letters = d_rcvd.Letters AND
d_sent.Numbers = d_rcvd.Numbers
;

我肯定有一些语法上的调整必须做出。让我知道进展如何。

检查左连接是否为空

MySQL case function

编辑:看起来大小写确实是按行排列的。

最新更新