THE DATA
那么,假设我有一个应用程序,我要测试汽车的速度,性能,安全性等。我有一组格式如下的数据:
CAR TABLE
ID CAR_NAME
1 Ford Focus 2006
2 Toyota Corolla 2014
5 Tesla Model S 2016
TEST TABLE
ID TEST_NAME
1 acceleration
2 topSpeed
3 zeroToSixty
4 smoothRide
5 seating
6 ergonomics
7 handling
8 breaking
9 audio
TEST_CATEGORY TABLE
ID CATEGORY_NAME
1 Speed
2 Performance
3 Comfort
4 Safety
TEST_PLAN TABLE
ID CAR_ID CATEGORY_ID TEST_ID
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
5 1 2 3
6 1 2 8
7 1 2 7
8 1 4 8
9 1 4 5
11 5 1 1
12 5 1 2
13 5 1 3
14 5 1 7
15 5 2 1
16 5 2 7
17 5 2 4
18 5 3 4
19 5 3 5
TEST_INSTANCE TABLE
ID CAR_ID TEST_ID CATEGORY_ID GRADE TEST_DATE
1 1 1 1 C 2015-03-14
2 1 1 1 C 2015-03-17
3 1 1 2 D 2015-03-15
4 1 1 2 C 2015-03-16
5 1 1 2 B 2015-03-17
6 1 3 1 B 2015-03-14
7 1 3 1 C 2015-03-16
8 1 8 2 C 2015-03-13
9 5 1 1 B 2015-03-15
10 5 1 1 B 2015-03-17
11 5 1 2 A 2015-03-19
12 5 1 2 A 2015-03-12
13 5 7 2 B 2015-03-14
14 5 7 2 B 2015-03-16
15 5 7 2 A 2015-03-18
16 5 7 2 B 2015-03-15
17 5 9 5 A 2015-03-14 --note that this test is not part of the test plan
所以有5个表,但只有最后两个(TEST_PLAN, TEST_INSTANCE)是重要的。其余的只是JOIN依赖项。
<<p> TEST_PLAN表/em>TEST_PLAN表定义了需要对每辆车执行的一系列测试。每个测试可以在多个类别中。TEST_PLAN表显示需要执行哪些测试以及在哪些类别中进行测试。速度类别下的加速测试,不计入性能类别下的加速测试。
The test plans look a bit like this:
Ford Focus 2006
Speed
acceleration
topSpeed
zeroToSixty
Performance
acceleration
zeroToSixty
breaking
handling
Safety
breaking
seating
Tesla Model S 2016
Speed
acceleration
topSpeed
zeroToSixty
handling
Performance
acceleration
handling
smoothRide
Comfort
smoothRide
seating
<<p> TEST_INSTANCE表/em>TEST_INSTANCE表跟踪执行了哪些测试(在哪个类别下)以及何时执行。给定的测试类别可能已经针对给定的汽车进行了不止一次。未在测试计划下的测试也可能已执行。
The TEST_INSTANCE table (all JOINed up) looks like this:
ID CAR TEST_NAME TEST_CATEGORY GRADE TEST_DATE
1 Ford Focus 2006 acceleration Speed C 2015-03-14
2 Ford Focus 2006 acceleration Speed C 2015-03-17
3 Ford Focus 2006 acceleration Performance D 2015-03-15
4 Ford Focus 2006 acceleration Performance C 2015-03-16
5 Ford Focus 2006 acceleration Performance B 2015-03-17
6 Ford Focus 2006 zeroToSixty Speed B 2015-03-14
7 Ford Focus 2006 zeroToSixty Speed C 2015-03-16
8 Ford Focus 2006 breaking Performance C 2015-03-13
9 Tesla Model S 2016 acceleration Speed B 2015-03-17
10 Tesla Model S 2016 acceleration Speed B 2015-03-15
11 Tesla Model S 2016 acceleration Performance A 2015-03-16
12 Tesla Model S 2016 acceleration Performance A 2015-03-14
13 Tesla Model S 2016 handling Performance B 2015-03-19
14 Tesla Model S 2016 handling Performance B 2015-03-12
15 Tesla Model S 2016 handling Performance A 2015-03-13
16 Tesla Model S 2016 handling Performance B 2015-03-18
17 Tesla Model S 2016 zeroToSisty Speed A 2015-03-14 --note that this test is not part of the test plan
What I'm Trying to do
我需要做一些涉及聚合的事情。我试图在给定的测试计划中获得所有测试的列表(对于单个汽车),并从测试实例中加入以下内容:
给定测试在给定类别中首次运行的日期。
给定测试在类别中运行的最近日期。
一个测试在给定类别中运行的总次数。
给定测试类别的最差结果
给定测试类别的最新结果。
我正在使用以下查询,我已经知道如何获得1-4。5:
SELECT
tp.id,
c.car_name
tc.category_name,
t.test_name,
MIN(ti.test_date) as firstRun, -- this grabs the date of the first time the test-category ran
MAX(ti.test_date) as latest, -- this grabs the latest date the test-category ran
MIN(ti.grade) as worst, -- this grabs the worst result fron the test-category
COUNT(ti.test_date) as testRuns, -- this grabs the total number of tests run from the category.
ti.grade as latestGrade -- this seems to give me the most recent grade from a test-category, but I can't be sure.
FROM
testPlan tp
LEFT JOIN car c ON c.id = tp.car_id
LEFT JOIN test_category tc ON tc.id = tp.category_id
LEFT JOIN test t ON t.id = tp.test_id
LEFT JOIN test_instance ti ON (
ti.car_id = tp.car_id
AND ti.category_id = tp.category_id
AND ti.test_id = tp.test_id
)
WHERE
tp.car_id = 5
GROUP BY
tp.id;
对于项目5,ti.grade as latestGrade
似乎给了我给定测试类别的最新分数,但我无法确定这一点。我想我需要一些ORDER BY
来确定。
预期结果
The final query should return something like this:
ID CAR TEST_CATEGORY TEST_NAME FIRST_RUN LATEST TESTS_RUN LATEST_GRADE WORST_GRADE
11 Tesla Model S Speed acceleration 2015-03-15 2015-03-17 2 B B
12 Tesla Model S Speed topSpeed NULL NULL 0 NULL NULL
13 Tesla Model S Speed zeroToSixty NULL NULL 0 NULL NULL
14 Tesla Model S Speed handling NULL NULL 0 NULL NULL
15 Tesla Model S Performance acceleration 2015-03-14 2015-03-16 2 A A
16 Tesla Model S Performance handling 2015-03-12 2015-03-19 4 A B
17 Tesla Model S Performance smoothRide NULL NULL 0 NULL NULL
18 Tesla Model S Comfort smoothRide NULL NULL 0 NULL NULL
19 Tesla Model S Comfort seating NULL NULL 0 NULL NULL
Resultset Note:
- there should be only one row per each test-category in the test plan (per car). (9 items in Tesla S test plan).
- 我需要在查询中添加什么以确保latestGrade始终是最近的?
- 当加入非计划结果时,
UNION ALL
会导致性能下降吗?(我的真实数据集是10k+记录返回)。是否有一种方法来重构我的查询,使它不需要UNION ALL
?
使用这个"简单"的查询获得1-4和最新等级的最高ID:
SELECT ti.car_id, ti.category_id, ti.test_id,
MIN(ti.test_date) as firstRun,
MAX(ti.test_date) as latest,
MIN(ti.grade) as worst,
COUNT(ti.test_date) as testRuns,
MAX(ti.ID) as max_id -- use later to get latest grade
FROM test_instance ti
GROUP BY ti.car_id, ti.category_id, ti.test_id
您的测试数据结果:
| CAR_ID | CATEGORY_ID | TEST_ID | firstRun | latest | worst | testRuns | max_id |
|--------|-------------|---------|------------|------------|-------|----------|--------|
| 1 | 1 | 1 | 2015-03-14 | 2015-03-17 | C | 2 | 2 |
| 1 | 1 | 3 | 2015-03-14 | 2015-03-16 | B | 2 | 7 |
| 1 | 2 | 1 | 2015-03-15 | 2015-03-17 | B | 3 | 5 |
| 1 | 2 | 8 | 2015-03-13 | 2015-03-13 | C | 1 | 8 |
| 5 | 1 | 1 | 2015-03-15 | 2015-03-17 | B | 2 | 10 |
| 5 | 2 | 1 | 2015-03-12 | 2015-03-19 | A | 2 | 12 |
| 5 | 2 | 7 | 2015-03-14 | 2015-03-18 | A | 4 | 16 |
| 5 | 5 | 9 | 2015-03-14 | 2015-03-14 | A | 1 | 17 |
http://sqlfiddle.com/!9/6df83/4
并将结果与其他表连接:
SELECT sub.firstRun, sub.latest, sub.worst, sub.testRuns,
ti.grade as latestGrade,
c.car_name,
tc.category_name,
t.test_name,
tp.id as test_plan_id
FROM (
SELECT ti.car_id, ti.category_id, ti.test_id,
MIN(ti.test_date) as firstRun,
MAX(ti.test_date) as latest,
MIN(ti.grade) as worst,
COUNT(ti.test_date) as testRuns,
MAX(ti.ID) as max_id -- use later to get latest grade
FROM test_instance ti
GROUP BY ti.car_id, ti.category_id, ti.test_id
) sub
JOIN test_instance ti ON ti.id = sub.max_id
JOIN car c ON c.id = sub.car_id
JOIN test_category tc ON tc.id = sub.category_id
JOIN test t ON t.id = sub.test_id
LEFT JOIN test_plan tp
ON tp.car_id = sub.car_id
AND tp.category_id = sub.category_id
AND tp.test_id = sub.test_id
结果:| firstRun | latest | worst | testRuns | latestGrade | CAR_NAME | CATEGORY_NAME | TEST_NAME | test_plan_id |
|------------|------------|-------|----------|-------------|--------------------|---------------|--------------|--------------|
| 2015-03-14 | 2015-03-17 | C | 2 | C | Ford Focus 2006 | Speed | acceleration | 1 |
| 2015-03-14 | 2015-03-16 | B | 2 | C | Ford Focus 2006 | Speed | zeroToSixty | 3 |
| 2015-03-15 | 2015-03-17 | B | 3 | B | Ford Focus 2006 | Performance | acceleration | 4 |
| 2015-03-13 | 2015-03-13 | C | 1 | C | Ford Focus 2006 | Performance | breaking | 6 |
| 2015-03-15 | 2015-03-17 | B | 2 | B | Tesla Model S 2016 | Speed | acceleration | 11 |
| 2015-03-12 | 2015-03-19 | A | 2 | A | Tesla Model S 2016 | Performance | acceleration | 15 |
| 2015-03-14 | 2015-03-18 | A | 4 | B | Tesla Model S 2016 | Performance | handling | 16 |
http://sqlfiddle.com/!9/6df83/5
如果测试不是测试计划的"一部分",test_plan_id
将为NULL。
您甚至可能不需要test_plan_id
,因此您可以删除该列和最后一个JOIN。