我在两个表之间执行联接,需要一个表中的值用作联接中子查询的LIMIT因子。假设我有以下[极其简化]的表格-
data:
experiment_id | value
--------------|--------
1 | 2.5
1 | 2.6
1 | 4.5
1 | 2.3
1 | 3.5
1 | 2.8
2 | 2.3
2 | 1.2
2 | 1.1
2 | 3.6
2 | 3.8
2 | 4.1
2 | 7.9
2 | 4.2
2 | 1.0
data_clip:
experiment_id | clip_index
--------------|------------
1 | 3
2 | 5
我需要将每个实验的排序值相加到某个clip_index,该值在不同的实验中有所不同。因此,我的结果表理想情况下是这样的:
results:
experiment_id | sum
--------------|-------
1 | 7.6 # => 2.3 + 2.5 + 2.8
2 | 13.0 # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8
通常,我会使用一些客户端脚本(ruby、python等(来进行计算,但我想尝试在数据库级别上进行计算。一些假想的SQL可能是这样的(我知道这个查询有各种各样的错误,但希望你能明白(:
SELECT
T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value
FROM data
WHERE experiment_id = t0.experiment_id
ORDER BY value
LIMIT t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0
几个问题:
- LIMIT必须用常数(1000、10等(定义,而不是用列定义
- 子查询中的
WHERE
条件由于不识别子查询外部的t0
表而失败
我的问题主要是如何使用SQL来实现两个表之间的变量限制和求和。我曾想过使用group_concat
和substring_index
来隔离每行的值,最多可达clip_index
,但随后出现了对编号字符串("1.2,2.3,3.2"
(求和的问题,以及服务器对group_concat
缓冲区大小的限制(可配置,但每次实验的值可能在~100k左右(。有什么想法吗?谢谢
我想您只需要在选择的每个值中包含一个行号,并根据行数限制结果像这样的东西:(未测试(
SELECT T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value,@rownum := @rownum + 1 AS rownum
FROM data
JOIN (SELECT @rownum := 0) r
WHERE experiment_id = t0.experiment_id
ORDER BY value
) AS x
WHERE x,rownum < t0.clip_index
) AS `sum`
FROM data_clip AS t0
请参阅:MySQL-在select 上获取行号
我认为当所有值都为正数时,这将起作用。如果存在负值,则需要多一个级别。
SELECT experiment_id
, MIN(sumValue) - (MIN(cnt)-clip_id) * MIN(maxValue)
AS sumValue
FROM
( SELECT e.experiment_id
, e.clip_id
, COUNT(*) AS cnt
, SUM(d2.value) AS sumValue
, d.value AS maxValue
FROM experiment AS e
JOIN data AS d
ON d.experiment_id = e.experiment_id
JOIN data AS d2
ON d2.experiment_id = e.experiment_id
AND d2.value <= d.value
GROUP BY e.experiment_id
, d.id --- table's `data` Primary Key
HAVING COUNT(*) >= e.clip_id
) AS grp
GROUP BY experiment_id