我的数据库表名为transactions,如下所示:
Name | Date (DateTime) | Type | Stock | Volume | Price | Total
Tom 2014-05-24 12:00:00 Sell Barclays 100 2.2 220.0
Bob 2014-04-13 15:00:00 Buy Coca-Cola 10 12.0 120.0
varchar DateTime varchar varchar int float float
我最初的问题是从表中删除属于第一个事务晚于某个阈值的用户的所有事务。我的问题是:
DELETE FROM transactions WHERE name NOT IN (SELECT name FROM transactions2 WHERE date < CAST('2014-01-01 12:00:00.000' as DateTime));
Query OK, 35850 rows affected (3 hours 5 min 28.88 sec)
我认为这是一个糟糕的解决方案,我不得不复制表以避免从我正在读取的同一表中删除,并且执行需要相当长的时间(对于包含~170k行的表3小时)
现在我正试图删除属于某个用户的所有事务,该用户的最新事务发生在某个阈值日期之前。
DELETE FROM transactions WHERE name IN (SELECT name FROM transactions HAVING max(date) < CAST('2015-01-01 12:00:00.000' as DateTime) );
遗憾的是,子查询只找到一个结果:
SELECT name FROM transactions HAVING max(date) < CAST('2015-01-01 12:00:00.000' as DateTime)';
+------------+
| name |
+------------+
| david |
+------------+
我想我只得到一个结果,因为max()函数。我不是SQL方面的专家,但是我很清楚我在集合和逻辑方面需要什么。如果你能告诉我如何重写我的查询,我将非常高兴。
编辑:下面是一个包含模式和一些数据的sqlfiddle: http://sqlfiddle.com/#!2/389ede/2
我需要删除alex的所有条目,因为他的最后一次交易发生在某个阈值之前(假设2013年1月1日)。不需要删除tom的交易记录,因为他的最新记录要晚于2013年1月1日
您的第一个查询可以表述为:'从先前不存在该用户的事务的事务中删除用户'。这很容易转换为sql:
delete from transactions t1
where not exists (
select 1 from transactions t2
where t1.name = t2.name
and t2.date < ?
)
mysql仍然不支持(AFAIK)从一个被引用的表中删除,所以我们需要重写为:
delete t1.*
from transactions t1
left join transactions t2
on t1.name = t2.name
and t2.date < ?
where t2.name is null
date是一个保留字,所以你必须引用它。
你的第二个查询可以用同样的方式解决,从某个日期后不存在的事务中删除。我把它留作练习。
Alvin,这是一个简化的场景,从你的摆弄日期:
CREATE TABLE transactions
( id int(11) NOT NULL AUTO_INCREMENT
, name varchar(30) NOT NULL
, value datetime NOT NULL
, PRIMARY KEY (id) ) ENGINE=InnoDB;
INSERT INTO transactions (name, value) VALUES ('alex', '2011-01-01 12:00:00')
, ('alex', '2012-06-01 12:00:00');
让我们研究一下:
SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
T1_NAME T1_VALUE T2_NAME T2_VALUE
alex January, 01 2011 12:00:00+0000 alex January, 01 2011 12:00:00+0000
alex January, 01 2011 12:00:00+0000 alex June, 01 2012 12:00:00+0000
alex June, 01 2012 12:00:00+0000 alex January, 01 2011 12:00:00+0000
alex June, 01 2012 12:00:00+0000 alex June, 01 2012 12:00:00+0000
。4行。如果我们现在添加连接谓词:
SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
AND t2.value > CAST('2011-06-01 12:00.000' as DateTime)
留给我们两行。如果我们将时间更改为'2012-06-01 12:00.000',由于左连接,我们仍然有两行,但是t2列将为空。
如果我们现在添加WHERE子句:SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
AND t2.value > CAST('2012-06-01 12:00.000' as DateTime)
WHERE t2.name is null;
仍然有两行。使用CAST('2011-06-01 12:00.000'作为DateTime),没有行。
请记住,该结构相当于:
SELECT t1.name as t1_name, t1.value as t1_value
FROM transactions t1
WHERE NOT EXISTS (
SELECT 1 FROM transactions t2
WHERE t1.name = t2.name
AND t2.value > CAST('2012-06-01 12:00.000' as DateTime)
);
因此,如果不存在value> '2012-06-01 12:00.000'的名称行,我们就有一个匹配。明白了吗?
@Lennart, Alvin,请考虑以下内容:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,val INT NOT NULL);
INSERT INTO my_table (val) VALUES (1),(1),(2),(1),(3),(2),(3),(1),(4);
SELECT * FROM my_table;
+----+-----+
| id | val |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 3 |
| 6 | 2 |
| 7 | 3 |
| 8 | 1 |
| 9 | 4 |
+----+-----+
让我们删除每个val
的最新结果,即…的结果
SELECT x.*
FROM my_table x
JOIN
( SELECT val, max(id) max_id FROM my_table GROUP BY val ) y
ON y.val = x.val
AND y.max_id = x.id;
+----+-----+
| id | val |
+----+-----+
| 8 | 1 |
| 6 | 2 |
| 7 | 3 |
| 9 | 4 |
+----+-----+
所以…
DELETE x
FROM my_table x
JOIN ( SELECT val, max(id) max_id FROM my_table GROUP BY val ) y
ON y.val = x.val
AND y.max_id = x.id;
SELECT * FROM my_table;
+----+-----+
| id | val |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 3 |
+----+-----+