我有一个如下所示的员工数据集。开始日期是雇员的雇用日期,结束日期是雇员离开雇主的日期。两者都是日期数据类型。
Gender StartDate EndDate
M 2010-04-30 2013-06-18
F 2010-01-09 2015-06-19
M 2009-09-08 2014-08-13
我想根据平均就业月数将员工数据分为四分位数。结果还包括员工总数(员工列(、男性员工百分比、女性员工百分比和平均就业月数。请在下面找到预期结果:
Quartile Employee %Male %Female AvgMonths
1 20 60.00 40.00 8.75
2 25 50.00 50.00 28.5
3 10 40.00 60.00 41.25
我想根据工作的月数获得 25%、50% 和 75% 的四分位数,并取平均值以获得平均月数。
下面请找到我的查询,我不知道在哪里可以将四分位数计算添加到查询中。
declare @current date;
set @current='2012-12-31';
select count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) +'%' as Female,
AVG(CASE WHEN EndDate is null then DATEDIFF(MONTH, StartDate, @current)
when EndDate is not null then DATEDIFF(MONTH, StartDate, EndDate)
end) as AvgMonths
from dbo.DimEmployee
-----------------------更新-------------------------------------
我自己想通了。 请在下面找到代码:
declare @current date;
set @current='2012-12-31';
select count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(10)) +'%' as Female,
AVG(t.EmployedMonths) as AvgMonths,
Ntile(3) over (order by t.EmployedMonths asc) as Quartiles
from
(select EmployeeKey, Gender,
CASE WHEN EndDate is null then abs(DATEDIFF(MONTH, StartDate, @current))
when EndDate is not null then abs(DATEDIFF(MONTH, StartDate, EndDate))
end as EmployedMonths
from dbo.DimEmployee)t
group by t.EmployedMonths
您可以使用rank()
和count(*)
作为窗口函数来计算四分位数:
select floor( rnk * 4.0 / cnt ) as quartile,
count(*) as Employees,
cast(cast(AVG(CASE WHEN Gender = 'M' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Male,
cast(cast(AVG(CASE WHEN Gender = 'F' THEN 1.0 ELSE 0 END)*100 as decimal(18,2)) as nvarchar(5)) + '%' as Female,
AVG(num_months * 1.0) as AvgMonths
from (select e.*,
rank() over (partition by datediff(month, startdate, coalesce(enddate, @current))) - 1 as rnk,
count(*) over () as cnt,
datediff(month, startdate, coalesce(enddate, @current)) as num_months
from dbo.DimEmployee e
) e
group by floor( rnk * 4.0 / cnt )