我用长格式重复了葡萄糖的测量,如下所示:
mydata <-
structure(list(
ID = c(4, 12, 24, 24, 24, 24, 24, 43, 50, 51, 52, 61, 67, 81, 82, 83, 88, 93, 93, 94, 100, 103, 105, 106, 107, 115, 117, 130, 130, 130, 130, 130, 130, 132, 136, 157, 173, 180, 194, 196, 230, 244, 245, 269, 288, 304, 316, 318, 334, 338, 338, 367, 378, 380),
date = structure(c(15330, 15476, 17641, 17664, 17664, 17670, 17673, 18696, 18194, 16036, 16428, 16210, 16211, 17667, 16329, 17961, 18535, 16834, 18088, 18571, 16449, 18213, 18003, 17976, 16862, 17842, 18019, 17339, 18513, 18629, 18699, 18700, 18700, 18423, 17184, 17487, 16736, 18780, 16876, 16895, 17163, 17443, 18291, 18493, 18213, 17947, 18452, 17919, 18129, 18152, 18794, 18507, 18640, 18654),
class = "Date"),
name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "gluc",
class = "factor"),
value = c(5.6, 5.5, 6.5, 7.6, 7.7, 7.8, 7.4, 4.3, 4.7, 5.1, 4.3, 5.2, 5.1, 5.8, 10, 5.2, 8.7, 4.5, 6.1, 4.6, 6, 5.8, 5.9, 5.5, 5.3, 5.9, 10.1, 6.4, 21.2, 5.1, 5.9, 7.4, NA, 8, 9.5, 4.6, 7, 8.1, 5.5, 7, 5, 6.2, 4.9, 4.8, 8.3, 6, 5.5, 6.8, 6.1, 4.8, 6.3, 5.7, 6.2, 13.7)),
row.names = c(NA, -54L),
class = c("tbl_df", "tbl", "data.frame"))
head(mydata)
# A tibble: 6 x 4
ID date name value
<dbl> <date> <fct> <dbl>
1 4 2011-12-22 gluc 5.6
2 12 2012-05-16 gluc 5.5
3 24 2018-04-20 gluc 6.5
4 24 2018-05-13 gluc 7.6
5 24 2018-05-13 gluc 7.7
6 24 2018-05-19 gluc 7.8
我正在尝试将此转换为宽幅格式。我试过:
# First try
lab_gluc_wide <-
pivot_wider(
data=mydata,
names_from=name,
values_from=value,
id_cols=c(ID, date))
# Second try
lab_gluc_wide <-
pivot_wider(
data=mydata,
names_from=name,
values_from=c(value, date),
id_cols=ID)
但是都产生警告消息
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
我要找的是每个患者一行,每个葡萄糖测量/日期有多个列。
您的问题是您的id也是在唯一的日子,所以如果您将数据重塑为宽格式,您还需要重塑日期列或删除它。在我的示例中,我删除了日期列。
library(tidyverse)
mydata %>%
group_by(ID) %>%
mutate(ID_ID = 1:n()) %>%
ungroup() %>%
pivot_wider(names_from = c(name, ID_ID),
id_cols = c(ID))
这给:
# A tibble: 43 x 7
ID gluc_1 gluc_2 gluc_3 gluc_4 gluc_5 gluc_6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 5.6 NA NA NA NA NA
2 12 5.5 NA NA NA NA NA
3 24 6.5 7.6 7.7 7.8 7.4 NA
4 43 4.3 NA NA NA NA NA
5 50 4.7 NA NA NA NA NA
6 51 5.1 NA NA NA NA NA
7 52 4.3 NA NA NA NA NA
8 61 5.2 NA NA NA NA NA
9 67 5.1 NA NA NA NA NA
10 81 5.8 NA NA NA NA NA
# ... with 33 more rows