如何使用dplyr创建基于R中二进制变量的值进行计数的计数变量

创建df的代码：我有类似下表的重复数据。

df <- structure(list(patid = c("1", "1", "1", "1", "2", "2", "3", "3", 
"3", "4", "4", "4", "4"), observation_date = c("07/07/2016", 
  "07/08/2016", "07/11/2016", "07/07/2019", "07/05/2015", "02/12/2016", 
  "07/05/2015", "07/06/2015", "16/06/2015", "07/05/2015", "02/12/2016", 
  "18/12/2016", "15/01/2017"),
registration = c("0","0","1","1","0","1","0","0","0","0","1","1","1")), class = "data.frame", row.names = c(NA, 
                                                                    -13L))

patid	observation_date	注册
1	2016年7月7日	0
1	2016年8月7日	0
1	2016年11月7日	1
1	2019年7月7日	1
2	2015年5月7日	0
2	2016年12月2日	1
3	2015年5月7日	0
3	2015年6月7日	0
3	2015年6月16日
4	2015年5月7日	0
4	2016年12月2日	1
4	2016年12月18日	1
4	2017年1月15日	1

使用count。为了使每个可能的值都出现在最终表格中，您应该将您的注册列转换为一个因子：

df %>% 
count(patid, registration = factor(registration), .drop = FALSE)

输出

patid registration n
1     1            0 2
2     1            1 2
3     2            0 1
4     2            1 1
5     3            0 3
6     3            1 0
7     4            0 1
8     4            1 3

使用base R

as.data.frame(table(df[c('patid', 'registration')]))
patid registration Freq
1     1            0    2
2     2            0    1
3     3            0    3
4     4            0    1
5     1            1    2
6     2            1    1
7     3            1    0
8     4            1    3

相关内容

最新更新

热门标签：