我有一个很大的数据集,包含住院、入院日期和出院日期的信息。
我想为每次住院找到包含住院时间最大份额的日历月。如果入院和出院之间的天数在几个月内平均分配,我想将其与入院月份联系起来。
下面是一个简单的例子:
hosp_dates <- data.table(id = c(1:5),
dt_admission = as.Date(c("2000-01-01", "2000-01-10", "2002-01-16", "2005-01-17", "2010-01-20")),
dt_discharge = as.Date(c("2000-01-20", "2000-02-02", "2002-02-16", "2005-02-16", "2010-03-31")))
hosp_dates %>%
mutate(month = c(1, 1, 1, 2, 3),
year = c(2000, 2000, 2002, 2005, 2010))
输出
id dt_admission dt_discharge month year
1: 1 2000-01-01 2000-01-20 1 2000
2: 2 2000-01-10 2000-02-02 1 2000
3: 3 2002-01-16 2002-02-16 1 2002
4: 4 2005-01-17 2005-02-16 2 2005
5: 5 2010-01-20 2010-03-31 3 2010
±la暴力(假设不同的id(:
library(collapse)
hosp_dates[,
c('m', 'y') := {
sd = seq.Date(dt_admission, dt_discharge, 1L)
.(fmode(month(sd)), fmode(year(sd)))
},
by = id]
# id dt_admission dt_discharge m y
# <int> <Date> <Date> <int> <int>
# 1: 1 2000-01-01 2000-01-20 1 2000
# 2: 2 2000-01-10 2000-02-02 1 2000
# 3: 3 2002-01-16 2002-02-16 1 2002
# 4: 4 2005-01-17 2005-02-16 2 2005
# 5: 5 2010-01-20 2010-03-31 3 2010
如果你的入院时间超过11个月,你会得到与预期不同的结果,你可能更喜欢这样的东西:
hosp_dates[,
ym := format(seq.Date(dt_admission, dt_discharge, 1L), "%Y-%m") |>
fmode(),
by = id]
# id dt_admission dt_discharge ym
# <int> <Date> <Date> <char>
# 1: 1 2000-01-01 2000-01-20 2000-01
# 2: 2 2000-01-10 2000-02-02 2000-01
# 3: 3 2002-01-16 2002-02-16 2002-01
# 4: 4 2005-01-17 2005-02-16 2005-02
# 5: 5 2010-01-20 2010-03-31 2010-03