我正在尝试根据日期获取患者的就诊次数。这在R和tidyverse库中是必要的。数据集示例如下:
structure(list(person_id = c(1, 2, 2, 3, 3, 3), arrival = c("2020-01-01 08:00:00",
"2020-01-01 09:00:00", NA, "2020-01-01 10:00:00", NA, NA), completed = c("2020-01-01 9:00:00",
"2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", NA, NA), admitted = c(NA,
NA, "2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", "2020-01-09 11:00:00"
), discharged = c(NA, NA, NA, NA, "2020/01/02 12:00:00", "2020-01-13 12:00:00"
), encounter_number = c(1, 2, 3, 4, 5, 6)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
数据集作为我想要的新列的输出的例子应该是这样的:
structure(list(person_id = c(1, 2, 2, 3, 3, 3), arrival = c("2020-01-01 08:00:00",
"2020-01-01 09:00:00", NA, "2020-01-01 10:00:00", NA, NA), completed = c("2020-01-01 9:00:00",
"2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", NA, NA), admitted = c(NA,
NA, "2020-01-01 11:00:00", NA, "2020-01-01 11:00:00", "2020-01-09 11:00:00"
), discharged = c(NA, NA, NA, NA, "2020/01/02 12:00:00", "2020-01-13 12:00:00"
), encounter_number = c(1, 2, 3, 4, 5, 6), person_total_encounter = c(1,
1, 2, 1, 2, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
注意!!
正如你所看到的,编号为2的人到达AE并完成AE,然后立即入院,但还没有出院日期。但我还是给出了一个总计2次的遭遇,一次是AE,另一次是住院患者,即使没有出院日期。此外,person_id为3的人在不同的日期被录取两次,但这总共会遇到2次,最后一次录取时会遇到1次。有人能帮我吗?
这里有一个想法。如果这不是你想的,请告诉我。
首先,您可以将数据放入长格式,event
在一列(表示到达、完成、入院、出院(,date
在第二列。并去除对结果没有贡献的NA
。
然后,您可以筛选要计数的事件。在这种情况下,我选择了完成并录取。
接下来你可以group_by
person_id
和date
(只是日期而不是时间(。person_total_encounter
将是row_number
,它只是date
和person_id
的运行计数或事件序列。
编辑:一开始就添加了select
,因为OP描述的原始数据集可能有额外的列。
library(tidyverse)
df1 %>%
select(person_id, encounter_number, arrival, completed, admitted, discharged) %>%
pivot_longer(cols = c(arrival, completed, admitted, discharged), names_to = "event", values_to = "date") %>%
drop_na() %>%
filter(event == "completed" | event == "admitted") %>%
group_by(person_id, date = as.Date(date)) %>%
mutate(person_total_encounter = row_number()) %>%
ungroup %>%
select(-c(event, date)) %>%
right_join(df1, by = c("person_id", "encounter_number"))
输出
# A tibble: 6 x 7
person_id encounter_number person_total_encounter arrival completed admitted discharged
<dbl> <dbl> <int> <chr> <chr> <chr> <chr>
1 1 1 1 2020-01-01 08:00:00 2020-01-01 09:00:00 NA NA
2 2 2 1 2020-01-01 09:00:00 2020-01-01 11:00:00 NA NA
3 2 3 2 NA NA 2020-01-01 11:00:00 NA
4 3 4 1 2020-01-01 10:00:00 2020-01-01 11:00:00 NA NA
5 3 5 2 NA NA 2020-01-01 11:00:00 2020-01-02 12:00:00
6 3 6 1 NA NA 2020-01-09 11:00:00 2020-01-13 12:00:00