我知道有一个Stata论坛有这个确切的标题,但我觉得它的语法没有那么有用,尤其是因为我的数据集有点不同。我有两个数据集。一个是一个人在设施中的停留时间,包括设施名称。它看起来像这样:
+---+-------------+---------------+-----------------------+
|ID#|Entrance Date| Exit Date | Facility Name |
|1 | 7/22/2009 | 2/24/2010 | Facility 1 |
|1 | 7/10/2010 | 11/21/2010 | Facility 2 |
|2 | 3/31/2010 | 9/23/2010 | Facility 1 |
|3 | 11/24/2010 | 7/5/2011 | Facility 3 |
|4 | 3/7/2007 | 4/19/2010 | Facility 2 |
+---+-------------+---------------+-----------------------+
下一个数据集显示了他们被访问的日期。里面只有ID和访问日期:
+---+-------------+
|ID#|Visit Date |
| 1 | 08/21/2009 |
| 1 | 09/02/2009 |
| 1 | 09/23/2009 |
| 3 | 04/22/2011 |
| 3 | 05/05/2011 |
+---+-------------+
我想在ID#
上将这两个文件合并在一起,其中VisitDate
位于Entrance Date
和Exit Date
之间,这样我就可以看到1。谁有访客,2。他们在什么设施中。
SSC上有一个名为rangejoin
的新用户编写程序,它是为这类问题量身定制的。要安装它,请在Stata的命令窗口中键入:
ssc install rangejoin
rangejoin
将根据入住和退房日期(所需间隔的界限)和就诊日期对每次入住进行配对。所有日期都必须是数字,所以我在下面的例子中预先将所有日期转换为Stata日期。
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009" "2/24/2010" "Facility 1" 18100 18317
1 "7/10/2010" "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010" "9/23/2010" "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011" "Facility 3" 18590 18813
4 "3/7/2007" "4/19/2010" "Facility 2" 17232 18371
end
format %td datein
format %td dateout
rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)
+-------------------------------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit nvisit visit_~t |
|-------------------------------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 08/21/2009 21aug2009 3 |
2. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/02/2009 02sep2009 3 |
3. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/23/2009 23sep2009 3 |
4. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
5. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
6. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 04/22/2011 22apr2011 2 |
7. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 05/05/2011 05may2011 2 |
|-------------------------------------------------------------------------------------------------------|
8. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 . 0 |
+-------------------------------------------------------------------------------------------------------+
如果需要,您可以使用恢复到原始观测值
by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
+------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit_~t |
|------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 3 |
2. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 0 |
3. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 0 |
4. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 2 |
5. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 0 |
+------------------------------------------------------------------------------+
任何类型的merge
在这里似乎都没有帮助,也不合适,因为您只能在标识符上匹配。我会使用append
。
clear
input ID str10 (Entrance Exit) Name
1 "7/22/2009" "2/24/2010" 1
1 "7/10/2010" "11/21/2010" 2
2 "3/31/2010" "9/23/2010" 1
3 "11/24/2010" "7/5/2011" 3
4 "3/7/2007" "4/19/2010" 2
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
sort ID, stable
by ID : gen T = _n
reshape long Date, i(ID T) j(Event) string
drop T
save Master, replace
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen Date = daily(Visit, "MDY")
drop Visit
gen Event = "Visit"
append using Master
sort ID Date
format Date %td
list, sepby(ID)
+----------------------------------+
| ID Date Event Name |
|----------------------------------|
1. | 1 22jul2009 Entrance 1 |
2. | 1 21aug2009 Visit . |
3. | 1 02sep2009 Visit . |
4. | 1 23sep2009 Visit . |
5. | 1 24feb2010 Exit 1 |
6. | 1 10jul2010 Entrance 2 |
7. | 1 21nov2010 Exit 2 |
|----------------------------------|
8. | 2 31mar2010 Entrance 1 |
9. | 2 23sep2010 Exit 1 |
|----------------------------------|
10. | 3 24nov2010 Entrance 3 |
11. | 3 22apr2011 Visit . |
12. | 3 05may2011 Visit . |
13. | 3 05jul2011 Exit 3 |
|----------------------------------|
14. | 4 07mar2007 Entrance 2 |
15. | 4 19apr2010 Exit 2 |
+----------------------------------+
现在请参阅此处了解如何填写缺失
另一种方法使用joinby
:
/* Set up Visits Data */
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen DateVisit = daily(Visit, "MDY")
drop Visit
tempfile Visits
save `Visits'
/* Set up Facilities Data */
clear
input ID str10 (Entrance Exit Name)
1 "7/22/2009" "2/24/2010" "Facility 1"
1 "7/10/2010" "11/21/2010" "Facility 2"
2 "3/31/2010" "9/23/2010" "Facility 1"
3 "11/24/2010" "7/5/2011" "Facility 3"
4 "3/7/2007" "4/19/2010" "Facility 2"
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td
/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit
* or...
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)
* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)
+---------------------------------------------------------------+
| ID Name DateEnt~e DateExit DateVisit Visitor |
|---------------------------------------------------------------|
1. | 1 Facility 1 22jul2009 24feb2010 21aug2009 1 |
2. | 1 Facility 1 22jul2009 24feb2010 02sep2009 1 |
3. | 1 Facility 1 22jul2009 24feb2010 23sep2009 1 |
4. | 1 Facility 2 10jul2010 21nov2010 . 0 |
|---------------------------------------------------------------|
5. | 2 Facility 1 31mar2010 23sep2010 . 0 |
|---------------------------------------------------------------|
6. | 3 Facility 3 24nov2010 05jul2011 22apr2011 1 |
7. | 3 Facility 3 24nov2010 05jul2011 05may2011 1 |
|---------------------------------------------------------------|
8. | 4 Facility 2 07mar2007 19apr2010 . 0 |
+---------------------------------------------------------------+