根据R中的日期计算persion_id的交互次数,tidyverse



我正在尝试根据日期获取患者的就诊次数。这在R和tidyverse库中是必要的。数据集示例如下:

structure(list(person_id = c(1, 2, 2, 3, 3, 3), arrival = c("2020-01-01 08:00:00", 
"2020-01-01 09:00:00", NA, "2020-01-01 10:00:00", NA, NA), completed = c("2020-01-01 9:00:00", 
"2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", NA, NA), admitted = c(NA, 
NA, "2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", "2020-01-09 11:00:00"
), discharged = c(NA, NA, NA, NA, "2020/01/02 12:00:00", "2020-01-13 12:00:00"
), encounter_number = c(1, 2, 3, 4, 5, 6)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

数据集作为我想要的新列的输出的例子应该是这样的:

structure(list(person_id = c(1, 2, 2, 3, 3, 3), arrival = c("2020-01-01 08:00:00", 
"2020-01-01 09:00:00", NA, "2020-01-01 10:00:00", NA, NA), completed = c("2020-01-01 9:00:00", 
"2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", NA, NA), admitted = c(NA, 
NA, "2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", "2020-01-09 11:00:00"
), discharged = c(NA, NA, NA, NA, "2020/01/02 12:00:00", "2020-01-13 12:00:00"
), encounter_number = c(1, 2, 3, 4, 5, 6), person_total_encounter = c(1, 
1, 2, 1, 2, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

注意!!

正如你所看到的,编号为2的人到达AE并完成AE,然后立即入院,但还没有出院日期。但我还是给出了一个总计2次的遭遇,一次是AE,另一次是住院患者,即使没有出院日期。此外,person_id为3的人在不同的日期被录取两次,但这总共会遇到2次,最后一次录取时会遇到1次。有人能帮我吗?

这里有一个想法。如果这不是你想的,请告诉我。

首先,您可以将数据放入长格式,event在一列(表示到达、完成、入院、出院(,date在第二列。并去除对结果没有贡献的NA

然后,您可以筛选要计数的事件。在这种情况下,我选择了完成并录取。

接下来你可以group_byperson_iddate(只是日期而不是时间(。person_total_encounter将是row_number,它只是dateperson_id的运行计数或事件序列。

编辑:一开始就添加了select,因为OP描述的原始数据集可能有额外的列。

library(tidyverse)
df1 %>%
select(person_id, encounter_number, arrival, completed, admitted, discharged) %>%
pivot_longer(cols = c(arrival, completed, admitted, discharged), names_to = "event", values_to = "date") %>%
drop_na() %>%
filter(event == "completed" | event == "admitted") %>%
group_by(person_id, date = as.Date(date)) %>%
mutate(person_total_encounter = row_number()) %>%
ungroup %>%
select(-c(event, date)) %>%
right_join(df1, by = c("person_id", "encounter_number"))

输出

# A tibble: 6 x 7
person_id encounter_number person_total_encounter arrival             completed           admitted            discharged         
<dbl>            <dbl>                  <int> <chr>               <chr>               <chr>               <chr>              
1         1                1                      1 2020-01-01 08:00:00 2020-01-01 09:00:00 NA                  NA                 
2         2                2                      1 2020-01-01 09:00:00 2020-01-01 11:00:00 NA                  NA                 
3         2                3                      2 NA                  NA                  2020-01-01 11:00:00 NA                 
4         3                4                      1 2020-01-01 10:00:00 2020-01-01 11:00:00 NA                  NA                 
5         3                5                      2 NA                  NA                  2020-01-01 11:00:00 2020-01-02 12:00:00
6         3                6                      1 NA                  NA                  2020-01-09 11:00:00 2020-01-13 12:00:00

最新更新