如何创建列为0的观测值



我使用下面的代码来获取2010Q1-2020Q4的个人季度工资。如果一个人在一个特定的季度没有工作,他们就没有那个季度的观察结果。相反,我希望有一个观察,但季度工资为0。例如,

What is currently happening:
|  MPI  | Quarter| Wage|
|PersonA|2010Q1  | 100 |
|PersonA|2010Q2  | 100 |
|PersonA|2010Q3  | 100 | 
|PersonB|2010Q1  | 100 |
Desired output
|  MPI  | Quarter| Wage|
|PersonA|2010Q1  | 100 |
|PersonA|2010Q2  | 100 |
|PersonA|2010Q3  | 100 | 
|PersonA|2010Q4  |  0  |
|PersonB|2010Q1  | 100 |
|PersonB|2010Q2  |  0  |
|PersonB|2010Q3  |  0  | 
|PersonB|2010Q4  |  0  |
ws_data AS (
SELECT
MASTER_PERSON_INDEX AS mpi
,SUBSTR(cast(wg.naics as string), 1, 2) AS NAICS_2
,SUBSTR(cast(wg.yrqtr as string), 0,5) AS quarter
,wg.yrqtr
,wg.employer
,wg.wages
,SUBSTR(cast(wg.yrqtr as string), 0,4) AS YEAR
FROM
(   SELECT
*
FROM
`ws.ws_ui_wage_records_di` wsui
WHERE  
wsui.MASTER_PERSON_INDEX IN (SELECT mpi FROM rc_table_ra16_all_grads_1b)
AND
wsui.yrqtr IN (20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)
)wg

),
ws_agg AS (
SELECT
mpi
--      ,STATS_MODE(NAICS_2) AS NAICS_2 
--   ,STATS_MODE(NAICS_DESC) AS NAICS_DESC
,quarter
,SUM(wages) AS wages_quart
FROM
ws_data
GROUP BY
mpi, quarter
),
ws_annot AS (
SELECT
dagg.*
,row_number() OVER(PARTITION BY dagg.mpi, cast(wages_quart as string) ORDER BY dagg.wages_quart  DESC)AS rn
FROM
ws_agg dagg
)

尝试使用此数据在顶部创建CTE作为Quarter表,然后将其用作主要from语句的起点。您应该能够将我从(wg where语句)复制的原始代码替换为顶部CTE。

(20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)

你的数据库可能有一个DateDimension表,其中有四分之一,你也可以使用。

既然你想要所有的季度和所有的个人,实现这一目标的一种方法是开始在数据中构建所有的个人-季度组合,并将其用作左连接中的"驱动程序";这样的:

select
Pers.MID
, Qtr.Quarter
, coalesce(W.Wage,0) as Wage
, ...
from 
(select distinct MPIfrom YourTable) Pers
cross join
(select distinct Quarter from DateDimensionTable) Qtr
left join
YourTable W
on w.MPI=Pers.MPI
and w.Quarter=Qtr.Quarter

如果你的表有你感兴趣的所有周期,你可以使用YourTable,而不是DateDimensionTable。但如果它没有,我猜它不能保证,那么你可以在这里使用日期/日历表,如果你有的话,或者你可以在YourTable中动态生成最小和最大季度之间的季度;只需搜索这些术语)。你也可以在你的查询中硬编码它们(如JBontje推荐的)。

如果在YourTable中缺少一个组合,那么该组合的工资将为空,您可以使用coalesce将其视为零。

相关内容

  • 没有找到相关文章

最新更新