r语言 - 计数观察值和考虑条件



我有这样一个数据库:

id <- c(rep(1,3), rep(2, 3), rep(3, 3))
condition <- c(0, 0, 1, 0, 0, 1, 1, 1, 0)
time_point1 <- c(1, 1, NA)
time_point2 <- c(NA, 1, NA)
time_point3 <- c(NA, NA, NA)
time_point4 <- c(1, NA, NA, 1, NA, NA, NA, NA, 1)
data <- data.frame(id, condition, time_point1, time_point2, time_point3, time_point4)
data
id condition time_point1 time_point2 time_point3 time_point4
1  1         0           1          NA          NA           1
2  1         0           1           1          NA          NA
3  1         1          NA          NA          NA          NA
4  2         0           1          NA          NA           1
5  2         0           1           1          NA          NA
6  2         1          NA          NA          NA          NA
7  3         1           1          NA          NA          NA
8  3         1           1           1          NA          NA
9  3         0          NA          NA          NA           1

我想创建一个表,其中有多少人符合条件== 1 (n_x),以及每个时间点(n_t)有多少人。如果没有,我也想要一个0。我试过了:

data %>% 
pivot_longer(cols = contains("time_point")) %>% 
filter (!is.na(value)) %>% 
group_by(name) %>% 
mutate(n_t = n_distinct(id)) %>% 
ungroup() %>% 
filter(condition == 1) %>%
group_by(name) %>%
summarise(n_x = n_distinct(id), n_t = first(n_t))

获得:

name          n_x   n_t
<chr>       <int> <int>
1 time_point1     1     3
2 time_point2     1     3

期望结果:我想要这种类型的表,考虑有条件和没有条件的情况:

name n_x n_t
1 time_point1   2   6
2 time_point2   1   3
3 time_point3   0   0
4 time_point4   0   3

谢谢!

您可以pivot_longer()能够group_by()时间点,然后总结只是加起来的值。对于条件,只对列values != NA的值求和。

data %>% 
pivot_longer(cols=c(3:6),names_to = 'point', values_to='values') %>%
group_by(point) %>% 
summarise(n_x = sum(condition[!is.na(values)]), n_t = sum(values, na.rm = TRUE))

输出:

# A tibble: 4 x 3
point         n_x   n_t
<chr>       <dbl> <dbl>
1 time_point1     2     6
2 time_point2     1     3
3 time_point3     0     0
4 time_point4     0     3

最新更新