我想从我的数据集中填充缺失的空值。我有这样的数据集
+---------------------+------+-------------+
| ORDER_DATE | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A | MIKE |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A | |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A | TIM |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A | |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B | |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B | JOHN |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B | |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C | |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C | MELINDA |
+---------------------+------+-------------+
我喜欢使用商店内空值之前的第一个找到的值来填充按商店划分的销售人员信息。我试过这个,但这没有产生正确的结果(如下(。如何解决这个问题?
CREATE TABLE SALES (
ORDER_DATE DATE,
SHOP VARCHAR2(30 CHAR),
SALESPERSON VARCHAR2(30 CHAR)
)
;
REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;
SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;
SELECT ORDER_DATE,
SHOP,
SALESPERSON,
/*tried two approaches*/
/*does not produce a correct result set*/
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
/*this also does not solve this*/
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;
正确的结果集是:
+---------------------+------+-------------+--------------------+
| ORDER_DATE | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A | MIKE | MIKE |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A | | MIKE |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A | TIM | TIM |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A | | TIM |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B | | |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B | JOHN | JOHN |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B | | JOHN |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C | | |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C | MELINDA | MELINDA |
+---------------------+------+-------------+--------------------+
你非常接近。
试试这个:
SELECT ORDER_DATE,
SHOP,
SALESPERSON,
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER
(PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1
FROM SALES
order by shop, order_date;
ORDER_DA SHOP SALESPERSON LAST_VALUE_1
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A MIKE MIKE
17/04/14 A MIKE
17/04/14 A TIM TIM
17/04/14 A TIM
17/04/14 B
17/04/14 B JOHN JOHN
17/04/14 B JOHN
17/04/14 C
17/04/14 C MELINDA MELINDA
9 rows selected.
以下查询在 SQL Server 中有效。我看不出它不应该在 Oracle 中工作的任何理由:
SELECT ORDER_DATE, SHOP,
MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
SELECT ORDER_DATE, SHOP, SALESPERSON,
SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END)
OVER
(PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
FROM mytable) AS t
ORDER BY ORDER_DATE
这是内部查询生成的内容:
ORDER_DATE SHOP SALESPERSON grp
---------------------------------------------
2017-04-14 04:44:27.000 A MIKE 1
2017-04-14 04:44:55.000 A NULL 1
2017-04-14 04:45:07.000 A TIM 2
2017-04-14 04:45:30.000 A NULL 2
2017-04-14 04:45:43.000 B NULL NULL
2017-04-14 04:46:13.000 B JOHN 1
2017-04-14 04:46:28.000 B NULL 1
2017-04-14 04:58:32.000 C NULL NULL
2017-04-14 04:58:41.000 C MELINDA 1
因此,使用字段grp
和字段SHOP
,我们可以识别一个应该共享相同SALESPERSON
值的记录"孤岛"。
在 SQL SERVER 2008 中,使用方式 大小写 何时,排序依据 :
法典:
SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
ISNULL(SHOP,'') AS SHOP,
ISNULL(SALESPERSON,'') AS SALESPERSON,
CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN
ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'') FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'')
ORDER BY ORDER_DATE DESC),'')
ELSE ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S
输出:
ORDER_DATE SHOP SALESPERSON SALESPERSON_FILLED
2017-04-14 04:44:27.000 A MIKE MIKE
2017-04-14 04:44:55.000 A MIKE
2017-04-14 04:45:07.000 A TIM TIM
2017-04-14 04:45:30.000 A TIM
2017-04-14 04:45:43.000 B
2017-04-14 04:46:13.000 B JOHN JOHN
2017-04-14 04:46:28.000 B JOHN
2017-04-14 04:58:32.000 C
2017-04-14 04:58:41.000 C MELINDA MELINDA