需要使用SQL独立查询从数据集中计算一些度量



数据集如下:这是一个名为-activity 的员工登录活动数量的示例数据集

我需要计算一些指标,在python数据帧中可以做到,但在mySQL中是新的。

  1. 2018年1月部门每天活跃的平均员工人数是多少(能够做到一半,但结果不正确。

  2. 2018年1月,每个dept_id(能够做到(每月唯一的活跃员工数量(登录>0(

  3. 从2017年12月到2018年1月,所有dept_id的月环比增长,其中至少有一名员工处于活动状态(登录>0(-不知道如何在sql中做到这一点

  4. 2017年12月活跃在每个dept_id,2018年1月也活跃在同一dept_id的用户比例

  5. 2018年1月有多少员工连续3天或以上登录

如有任何帮助,我们将不胜感激。

针对案例1:编写的查询

select dept_id, 
DAU 
from
(
select dept_id
, month(date)
, year(date)
, avg(logins) as DAU
from
(select * from activity where login >0)
where year(date) =2018 
and month(date) =1
group by dept_id, month(date), year(date)
)

文本格式数据集

date    dept_id emp_id  logins
29-11-2017  ABC001  A1  1
30-11-2017  ABC002  A2  2
01-12-2017  XYZ001  A3  0
01-12-2017  XYZ002  A4  1
03-12-2017  ABC001  D2  4
04-12-2017  ABC002  D1  1
05-12-2017  XYZ001  A6  2
05-12-2017  XYZ002  A7  3
30-12-2017  ABC001  A8  0
01-01-2018  ABC002  A2  6
02-01-2018  XYZ001  A10 4
03-01-2018  XYZ002  A11 2
04-01-2018  ABC001  A1  2
04-01-2018  ABC002  A2  0
05-01-2018  XYZ001  A13 4
05-01-2018  XYZ001  A6  2
05-01-2018  XYZ002  A7  1
06-01-2018  XYZ001  A6  2
06-01-2018  XYZ002  A7  3
07-01-2018  XYZ001  A6  3
07-01-2018  XYZ002  A7  4
06-01-2018  XYZ002  A14 3
30-01-2018  ABC001  A15 2

如果这样做有效,请告诉我,否则我将更新答案,因为我没有安装MYSQL,所以无法检查。

日期在oracle中是一个关键字,但在MYSQL中不确定,所以在引号中使用它,比如";日期";。

案例1:

SELECT dept_id, 
AVG(cnt) average_emp 
FROM (SELECT dept_id, 
days, 
COUNT(emp_id) cnt
FROM (SELECT dept_id, 
emp_id, 
SUM(logins) logins, 
to_char(DATES, 'dd') days
FROM mytable 
WHERE to_char(DATES,'mmyyyy') = '012018' 
GROUP BY dept_id, 
emp_id, 
to_char(DATES, 'dd')          )
WHERE logins > 0 
GROUP BY dept_id, 
days    )
GROUP BY dept_id;

案例2:

SELECT dept_id, 
COUNT(emp_id) 
FROM (SELECT dept_id, 
emp_id, 
SUM(logins) logins
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
GROUP BY dept_id, 
emp_id      )
WHERE logins > 0 
GROUP BY dept_id;

案例3:

SELECT months, 
users,
ROUND( (users - nvl(LAG(users) OVER (ORDER BY rownum),users) ) / nvl(LAG(users) OVER (ORDER BY ROWNUM), 1)
, 2) growth_rate
FROM (SELECT to_char(mt.DATES, 'MON-YYYY') months,
count(mt.EMP_ID)              users
FROM (SELECT *
FROM MYTABLE
ORDER BY DATES) mt
WHERE mt.DATES >= to_date('DEC-2017', 'MON-YYYY') 
AND mt.DATES <= to_date('JAN-2018', 'MON-YYYY')
GROUP BY to_char(mt.DATES, 'MON-YYYY')
ORDER BY to_date(months, 'MON-YYYY')            ) oq
WHERE exists(SELECT 1 
FROM MYTABLE iq
WHERE to_char(iq.DATES, 'MON-YYYY') = oq.months
AND iq.EMP_ID IN (SELECT EMP_ID 
FROM MYTABLE 
WHERE iq.LOGINS > 0)       );

案例4:

SELECT dept_id, 
emp_id 
FROM (SELECT dept_id, 
emp_id
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '122017' 
AND logins > 0 
GROUP BY dept_id, 
emp_id      )
INTERSECT
SELECT dept_id, 
emp_id 
FROM (SELECT dept_id, 
emp_id
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018' 
AND logins > 0 
GROUP BY dept_id, 
emp_id     )

案例5:

-- not full proof
SELECT COUNT(*) emp_cnt 
FROM (SELECT emp_id,
DENSE_RANK() OVER(ORDER BY DATES) rn,
COUNT(*) OVER(PARTITION BY emp_id ORDER BY DATES) cnt
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
AND logins > 0
ORDER BY rn, 
cnt      )
WHERE rn = cnt 
AND rn >= 3;

相关内容

最新更新