Stata基于一系列日期合并数据集



我知道有一个Stata论坛有这个确切的标题,但我觉得它的语法没有那么有用,尤其是因为我的数据集有点不同。我有两个数据集。一个是一个人在设施中的停留时间,包括设施名称。它看起来像这样:

+---+-------------+---------------+-----------------------+
|ID#|Entrance Date|  Exit Date    | Facility Name         |
|1  | 7/22/2009   | 2/24/2010     | Facility 1            |
|1  | 7/10/2010   | 11/21/2010    | Facility 2            |
|2  | 3/31/2010   | 9/23/2010     | Facility 1            |
|3  | 11/24/2010  | 7/5/2011      | Facility 3            |
|4  | 3/7/2007    | 4/19/2010     | Facility 2            |
+---+-------------+---------------+-----------------------+

下一个数据集显示了他们被访问的日期。里面只有ID和访问日期:

+---+-------------+
|ID#|Visit Date   |
| 1 | 08/21/2009  |
| 1 | 09/02/2009  |
| 1 | 09/23/2009 |
| 3 | 04/22/2011  |
| 3 | 05/05/2011  |
+---+-------------+

我想在ID#上将这两个文件合并在一起,其中VisitDate位于Entrance DateExit Date之间,这样我就可以看到1。谁有访客,2。他们在什么设施中。

SSC上有一个名为rangejoin的新用户编写程序,它是为这类问题量身定制的。要安装它,请在Stata的命令窗口中键入:

ssc install rangejoin

rangejoin将根据入住和退房日期(所需间隔的界限)和就诊日期对每次入住进行配对。所有日期都必须是数字,所以我在下面的例子中预先将所有日期转换为Stata日期。

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009"  "2/24/2010"  "Facility 1" 18100 18317
1 "7/10/2010"  "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010"  "9/23/2010"  "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011"   "Facility 3" 18590 18813
4 "3/7/2007"   "4/19/2010"  "Facility 2" 17232 18371
end
format %td datein
format %td dateout
rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)
     +-------------------------------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout        visit      nvisit   visit_~t |
     |-------------------------------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   08/21/2009   21aug2009          3 |
  2. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/02/2009   02sep2009          3 |
  3. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/23/2009   23sep2009          3 |
  4. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  5. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  6. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   04/22/2011   22apr2011          2 |
  7. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   05/05/2011   05may2011          2 |
     |-------------------------------------------------------------------------------------------------------|
  8. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010                        .          0 |
     +-------------------------------------------------------------------------------------------------------+

如果需要,您可以使用恢复到原始观测值

by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
     +------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout   visit_~t |
     |------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010          3 |
  2. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010          0 |
  3. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010          0 |
  4. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011          2 |
  5. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010          0 |
     +------------------------------------------------------------------------------+

任何类型的merge在这里似乎都没有帮助,也不合适,因为您只能在标识符上匹配。我会使用append

clear 
input ID str10 (Entrance Exit) Name  
1 "7/22/2009"  "2/24/2010"  1  
1 "7/10/2010"  "11/21/2010" 2  
2 "3/31/2010"  "9/23/2010"  1  
3 "11/24/2010" "7/5/2011"   3  
4 "3/7/2007"   "4/19/2010"  2 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 
sort ID, stable 
by ID : gen T = _n 
reshape long Date, i(ID T) j(Event) string 
drop T 
save Master, replace 
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen Date = daily(Visit, "MDY") 
drop Visit 
gen Event = "Visit" 
append using Master 
sort ID Date 
format Date %td 
list, sepby(ID)  
    +----------------------------------+
     | ID        Date      Event   Name |
     |----------------------------------|
  1. |  1   22jul2009   Entrance      1 |
  2. |  1   21aug2009      Visit      . |
  3. |  1   02sep2009      Visit      . |
  4. |  1   23sep2009      Visit      . |
  5. |  1   24feb2010       Exit      1 |
  6. |  1   10jul2010   Entrance      2 |
  7. |  1   21nov2010       Exit      2 |
     |----------------------------------|
  8. |  2   31mar2010   Entrance      1 |
  9. |  2   23sep2010       Exit      1 |
     |----------------------------------|
 10. |  3   24nov2010   Entrance      3 |
 11. |  3   22apr2011      Visit      . |
 12. |  3   05may2011      Visit      . |
 13. |  3   05jul2011       Exit      3 |
     |----------------------------------|
 14. |  4   07mar2007   Entrance      2 |
 15. |  4   19apr2010       Exit      2 |
     +----------------------------------+

现在请参阅此处了解如何填写缺失

另一种方法使用joinby:

/* Set up Visits Data */
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen DateVisit = daily(Visit, "MDY") 
drop Visit 
tempfile Visits
save `Visits'
/* Set up Facilities Data */
clear 
input ID str10 (Entrance Exit Name)  
1 "7/22/2009"  "2/24/2010"  "Facility 1"  
1 "7/10/2010"  "11/21/2010" "Facility 2"  
2 "3/31/2010"  "9/23/2010"  "Facility 1" 
3 "11/24/2010" "7/5/2011"   "Facility 3"
4 "3/7/2007"   "4/19/2010"  "Facility 2" 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 
/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td
/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit
* or... 
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)
* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)
     +---------------------------------------------------------------+
     | ID         Name   DateEnt~e    DateExit   DateVisit   Visitor |
     |---------------------------------------------------------------|
  1. |  1   Facility 1   22jul2009   24feb2010   21aug2009         1 |
  2. |  1   Facility 1   22jul2009   24feb2010   02sep2009         1 |
  3. |  1   Facility 1   22jul2009   24feb2010   23sep2009         1 |
  4. |  1   Facility 2   10jul2010   21nov2010           .         0 |
     |---------------------------------------------------------------|
  5. |  2   Facility 1   31mar2010   23sep2010           .         0 |
     |---------------------------------------------------------------|
  6. |  3   Facility 3   24nov2010   05jul2011   22apr2011         1 |
  7. |  3   Facility 3   24nov2010   05jul2011   05may2011         1 |
     |---------------------------------------------------------------|
  8. |  4   Facility 2   07mar2007   19apr2010           .         0 |
     +---------------------------------------------------------------+

最新更新