数据
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
3 2013 APPLE NULL 2011-11-18 00:00:00.000
4 2013 APPLE NULL 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON NULL 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE NULL 2004-09-28 00:00:00.000
ROW不是表中的列。只是为了显示我想要的唱片。
问题
以上数据按年份划分,按日期排序。
我需要根据以下逻辑保留除第3行和第4行之外的所有行:
- 如果组的第一行(此处为(YEAR,PROD((为NULL,则丢弃它们
- 11和16是空的,但我们保留它们,因为它们不是它们组中的第一个
每个组都必须从KEY不为空的记录开始
===>否则丢弃
换句话说,我可以有:not null,null,not null,null
但我不能有:null,not null,null,not null
预期结果
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON TIME 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE TIME 2004-09-28 00:00:00.000
我想这样做,所以以后在每个组的开头总是有一个非空键。这样,我以后总是可以使用前一行来填充具有空值的后续记录(在本例中为11和16(
如有任何意见或建议,我们将不胜感激!
下面得到您想要的输出。我正在检查前一行和当前行之间的键列的值,由于NULL具有最高的秩,如果前一行不为NULL,则会用not NULL列填充字段min_val。
select * from (
select year,prod,key1,date1
,min(key1) over(partition by year,prod order by date1 asc) as min_val
from t
)x
where x.min_val is not null
+------+-------+------+-------------------------+---------+
| year | prod | key1 | date1 | min_val |
+------+-------+------+-------------------------+---------+
| 2011 | APPLE | TIME | 2011-11-18 00:00:00.000 | TIME |
| 2011 | APPLE | TIME | 2011-11-19 00:00:00.000 | TIME |
| 2013 | APPLE | TIME | 2014-04-08 00:00:00.000 | TIME |
| 2013 | APPLE | DIM | 2014-04-09 00:00:00.000 | DIM |
| 2013 | APPLE | TIME | 2014-11-10 10:50:14.113 | DIM |
| 2013 | APPLE | TIME | 2014-11-12 10:46:04.947 | DIM |
| 2013 | MELON | JAK | 2011-10-17 11:01:19.657 | JAK |
| 2013 | MELON | TIME | 2014-11-18 11:19:35.547 | JAK |
| 2013 | MELON | | 2014-11-19 11:19:35.547 | JAK |
| 2013 | MELON | TIME | 2014-11-21 10:32:36.017 | JAK |
| 2014 | APPLE | JAK | 2003-04-10 00:00:00.000 | JAK |
| 2014 | APPLE | DIM | 2003-04-11 00:00:00.000 | DIM |
| 2015 | APPLE | TIME | 2002-09-27 00:00:00.000 | TIME |
| 2015 | APPLE | | 2004-09-28 00:00:00.000 | TIME |
+------+-------+------+-------------------------+---------+
链接https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ae82f64802674aa60005b8e9f534a150
可能有更花哨的解决方案,但本质上(如果KEY、DATE等不是产品中的保留词,则可以去掉方括号-我使用了TSQL(:
select *
from Tbl T1
where
/* Do not include if... */
NOT (
t1.[KEY] is null
/* This is part of the first KEY=NULL rows for this group
(no preceding record with KEY<>NULL) */
and not exists
(select 1
from Tbl T3
where T3.[YEAR]=T1.[YEAR]
and T3.PROD=T1.PROD
and T3.[DATE] < T1.[DATE]
and T3.[KEY] is not null
)
/* There are KEY<>NULL values further down */
and exists
(select 1
from Tbl T2
where T2.[YEAR]=T1.[YEAR]
and T2.PROD=T1.PROD
and T2.[DATE] > T1.[DATE]
and T2.[KEY] is not null
)
)
这种查询可能会有所帮助:
select YEAR, PROD, KEY, DATE
from (
select YEAR, PROD, KEY, DATE,
MIN(CASE WHEN KEY IS NULL THEN DATE ELSE NULL END)
OVER(PARTITION BY YEAR, PROD) AS MIN_NULL_KEY_DATE,
ROW_NUMBER() OVER(PARTITION BY YEAR, PROD ORDER BY DATE ASC) RN
from your_table yt
)rpr
where 1 = 1
and CASE WHEN RN = 1 AND DATE = MIN_NULL_KEY_DATE THEN 0 ELSE 1 END = 1
那么我在这里尝试实现了什么:当key列为null时,我们刚刚找到了基于年份和prod列的最小日期。还要检查该行是否为该组的第一行。如果rn=1并且日期等于键为null时的最小日期值,则忽略它们以备不时之需。