我想知道如何计算每天的患者人数,它希望包括当天出院的患者,但包括入院的患者。因此,患者c不应在7/17被发现。
我有一个更大的数据集。这只是一个例子。
谢谢你帮我。
install.packages("lubridate")
library(lubridate)
admission <- c("06/23/2013", "06/30/2013", "07/12/2014","06/24/2013","06/28/2013","06/29/2013","06/23/2013","06/24/2013","06/24/2013")
discharge<- c("06/25/2013", "07/03/2013", "07/17/2014","06/30/2013","06/30/2013","07/02/2013","06/29/2013","06/29/2013","06/27/2013")
patient<-c("a","b","c","d","e","f","g","h","j")
admission.date <- mdy(admission)
discharge.date <- mdy(discharge)
df<-data.frame(patient,admission.date,discharge.date)
df
patient admission.date discharge.date
1 a 2013-06-23 2013-06-25
2 b 2013-06-30 2013-07-02
3 c 2014-07-12 2014-07-17
4 d 2013-06-24 2013-06-30
5 e 2013-06-28 2013-06-30
6 f 2013-06-29 2013-07-02
7 g 2013-06-23 2013-06-29
8 h 2013-06-24 2013-06-29
9 j 2013-06-24 2013-06-27
这里有一种使用data.table
的方法
library(data.table)
# set df to data.table format
setDT(df)
# Create a table with all dates
dt.dates <- data.table( date = seq(min(df$admission.date), max(df$discarge.date), by = "1 days") )
# perform overlap join
answer <- df[dt.dates, .(date, patient), on = .(admission.date <= date, discarge.date > date), nomatch = 0L]
# get unique patients by date
answer[, .(patients = uniqueN(patient)), by = date]
# date patients
# 1: 2013-06-23 2
# 2: 2013-06-24 5
# 3: 2013-06-25 4
# 4: 2013-06-26 4
# 5: 2013-06-27 3
# 6: 2013-06-28 4
# 7: 2013-06-29 3
# 8: 2013-06-30 2
# 9: 2013-07-01 2
#10: 2013-07-02 1
#11: 2014-07-12 1
#12: 2014-07-13 1
#13: 2014-07-14 1
#14: 2014-07-15 1
#15: 2014-07-16 1
这是一种使用dplyr
:的方法
library(dplyr)
df %>%
rowwise() %>%
do(data.frame(patient=.$patient, Date=seq(.$admission.date,.$discarge.date-1,by="day"))) %>%
group_by(Date) %>%
summarize(patients = n())
带输出:
# A tibble: 15 x 2
Date patients
<date> <int>
1 2013-06-23 2
2 2013-06-24 5
3 2013-06-25 4
4 2013-06-26 4
5 2013-06-27 3
6 2013-06-28 4
7 2013-06-29 3
8 2013-06-30 2
9 2013-07-01 2
10 2013-07-02 1
11 2014-07-12 1
12 2014-07-13 1
13 2014-07-14 1
14 2014-07-15 1
15 2014-07-16 1