在Stata中过滤特定数据

我正在使用Stata 13，并且必须为2000年至2003年的给定时期清理具有不同id的面板格式的数据集。我的数据如下:

id   year    ln_wage
1    2000     2.30
1    2001     2.31
1    2002     2.31
2    2001     1.89
2    2002     1.89
2    2003     2.10
3    2002     1.60
4    2002     2.46
4    2003     2.47
5    2000     2.10
5    2001     2.10
5    2003     2.12

我想在每年的数据集中只保留在t-1年出现的个体。这样，我的样本的第一年(2000年)将被丢弃。我正在寻找这样的输出:

2001年

id   year    ln_wage
1    2001     2.31
5    2001     2.10

2002年

id   year    ln_wage
1    2002     2.31
2    2002     1.89

2003年

id   year        ln_wage
2    2003     2.10
4    2003     2.47

问候,

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int year float ln_wage
1 2000  2.3
1 2001 2.31
1 2002 2.31
2 2001 1.89
2 2002 1.89
2 2003  2.1
3 2002  1.6
4 2002 2.46
4 2003 2.47
5 2000  2.1
5 2001  2.1
5 2003 2.12
end
xtset id year
drop if missing(L.ln_wage)
sort year id
list, noobs sepby(year)
+---------------------+
| id   year   ln_wage |
|---------------------|
|  1   2001      2.31 |
|  5   2001       2.1 |
|---------------------|
|  1   2002      2.31 |
|  2   2002      1.89 |
|---------------------|
|  2   2003       2.1 |
|  4   2003      2.47 |
+---------------------+
// Alternatively, assuming no duplicate years within id exist
bysort id (year): gen todrop = year[_n-1] != year - 1
drop if todrop

相关内容

最新更新

热门标签：