如何使用分析函数填充缺失值



我想从我的数据集中填充缺失的空值。我有这样的数据集

+---------------------+------+-------------+
| ORDER_DATE          | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A    | MIKE        |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A    | TIM         |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B    | JOHN        |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |
+---------------------+------+-------------+

我喜欢使用商店内空值之前的第一个找到的值来填充按商店划分的销售人员信息。我试过这个,但这没有产生正确的结果(如下(。如何解决这个问题?

CREATE TABLE SALES (
ORDER_DATE DATE, 
SHOP VARCHAR2(30 CHAR), 
SALESPERSON VARCHAR2(30 CHAR)
)
;
REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;
SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;
SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,
       /*tried two approaches*/
       /*does not produce a correct result set*/
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
                   ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
       /*this also does not solve this*/            
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
                  ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;

正确的结果集是:

+---------------------+------+-------------+--------------------+
| ORDER_DATE          | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A    | MIKE        |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A    |             |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A    | TIM         |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A    |             |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B    | JOHN        |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B    |             |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |  MELINDA           |
+---------------------+------+-------------+--------------------+

你非常接近。
试试这个:

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER 
            (PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1
FROM SALES
order by shop, order_date;

ORDER_DA SHOP                           SALESPERSON                    LAST_VALUE_1                  
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A                              MIKE                           MIKE                          
17/04/14 A                                                             MIKE                          
17/04/14 A                              TIM                            TIM                           
17/04/14 A                                                             TIM                           
17/04/14 B                                                                                           
17/04/14 B                              JOHN                           JOHN                          
17/04/14 B                                                             JOHN                          
17/04/14 C                                                                                           
17/04/14 C                              MELINDA                        MELINDA                       
9 rows selected. 

以下查询在 SQL Server 中有效。我看不出它不应该在 Oracle 中工作的任何理由:

SELECT ORDER_DATE, SHOP, 
       MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
   SELECT ORDER_DATE, SHOP, SALESPERSON,
          SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END) 
          OVER
          (PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
   FROM mytable) AS t
ORDER BY ORDER_DATE

这是内部查询生成的内容:

ORDER_DATE              SHOP SALESPERSON  grp
---------------------------------------------
2017-04-14 04:44:27.000 A    MIKE         1
2017-04-14 04:44:55.000 A    NULL         1
2017-04-14 04:45:07.000 A    TIM          2
2017-04-14 04:45:30.000 A    NULL         2
2017-04-14 04:45:43.000 B    NULL         NULL
2017-04-14 04:46:13.000 B    JOHN         1
2017-04-14 04:46:28.000 B    NULL         1
2017-04-14 04:58:32.000 C    NULL         NULL
2017-04-14 04:58:41.000 C    MELINDA      1

因此,使用字段grp和字段SHOP,我们可以识别一个应该共享相同SALESPERSON值的记录"孤岛"。

在 SQL SERVER 2008 中,使用方式 大小写 何时,排序依据 :

法典:

 SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
    ISNULL(SHOP,'') AS SHOP,
    ISNULL(SALESPERSON,'') AS SALESPERSON,
    CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN 
    ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'')  FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
    AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'') 
    ORDER BY ORDER_DATE DESC),'')
    ELSE  ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S

输出:

    ORDER_DATE              SHOP   SALESPERSON  SALESPERSON_FILLED
    2017-04-14 04:44:27.000 A      MIKE         MIKE
    2017-04-14 04:44:55.000 A                   MIKE
    2017-04-14 04:45:07.000 A      TIM          TIM
    2017-04-14 04:45:30.000 A                   TIM
    2017-04-14 04:45:43.000 B       
    2017-04-14 04:46:13.000 B      JOHN         JOHN
    2017-04-14 04:46:28.000 B                   JOHN
    2017-04-14 04:58:32.000 C       
    2017-04-14 04:58:41.000 C      MELINDA      MELINDA

相关内容

  • 没有找到相关文章

最新更新