Stata中多个观测值(面板数据)的互斥性

  • 本文关键字:数据 个观 Stata stata
  • 更新时间 :
  • 英文 :


我使用Stata/MP 16.1对不同曝光进行了多次观察。我想把exposureid分组,这取决于曝光是否互斥。请参考数据示例。

所需的变量是我手动制作的groups。由于数据集包含100,000个观测值,我如何通过代码实现所需变量groups?

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "a" 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "a" 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify

bysort id (exposure) : gen wanted = cond(exposure[1] != exposure[_N], 1, cond(exposure[1] == "a", 2, cond(exposure[1] == "b", 3, .)))
label val wanted groups 
assert wanted == groups 

逻辑是

如果一个id中有不同的值,赋值1

则值相同;所以

如果第一个值是a,则赋值2(相当于所有值都是a)

如果第一个值是b,则赋值为3(相当于所有值都是b)

否则赋值为missing——根据您的示例,不应该有这样的值,但是检查是一个好主意。

当然,你可以把它分解成更短的语句:

bysort id (exposure) : gen wanted = 1 if exposure[1] != exposure[_N] 
by id: replace wanted = 2 if exposure[1] == "a" 
by id: replace wanted = 3 if exposure[2] == "b" 
这里是一些更复杂的设置技术。注意Stata没有给"."附加任何特殊的含义。
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
label def groups 4 "only c", modify
gen OK = exposure != "."
sort OK id exposure 
by OK id: gen wanted = 1 if OK & exposure[1] != exposure[_N] 
by OK id: replace wanted = 2 if wanted == . & OK & exposure[1] == "a"
by OK id: replace wanted = 3 if wanted == . & OK & exposure[1] == "b"
by OK id: replace wanted = 4 if wanted == . & OK & exposure[1] == "c"
bysort id (exposure OK) : replace wanted = wanted[_N]
drop OK 
label val wanted groups 
list, sepby(id)
+-----------------------------------------------------------------+
| id   exposure                   groups                   wanted |
|-----------------------------------------------------------------|
1. |  1          .                   only a                   only a |
2. |  1          a                   only a                   only a |
3. |  1          a                   only a                   only a |
|-----------------------------------------------------------------|
4. |  2          .   not mutually exclusive   not mutually exclusive |
5. |  2          a   not mutually exclusive   not mutually exclusive |
6. |  2          b   not mutually exclusive   not mutually exclusive |
7. |  2          c   not mutually exclusive   not mutually exclusive |
|-----------------------------------------------------------------|
8. |  3          a   not mutually exclusive   not mutually exclusive |
9. |  3          c   not mutually exclusive   not mutually exclusive |
10. |  3          c   not mutually exclusive   not mutually exclusive |
|-----------------------------------------------------------------|
11. |  4          b                   only b                   only b |
12. |  4          b                   only b                   only b |
13. |  4          b                   only b                   only b |
+-----------------------------------------------------------------+

最新更新