r语言 - 'dplyr's 'summarize_all()' by some condition - r - `dplyr`'s `summarize

我有一个数据框，其中包含一个ID列和多个列，我想汇总。在每一列(互斥(中，我想计算与"a"、"b"或两者之一匹配的行。

> df
# A tibble: 5 x 3
     id col1  col2  col3
  <dbl> <chr> <chr> <chr>
1     1 NA    b     NA
2     2 NA    b     NA
3     3 NA    NA    a
4     4 b     NA    NA
5     5 a     NA    NA

据我所知：

> df %>%
    group_by(id) %>%
    summarize_all(a = nrow(. %>% filter(. == "a"),
                  b = nrow(. %>% filter(. == "b"),
                  x = nrow(!is.na(.))
Error: Can't create call to non-callable object
Call `rlang::last_error()` to see a backtrace

我采取了正确的方法吗？我试图得到看起来像这样的东西：

var   a  b  x
-------------
col1  1  1  2
col2  0  2  2
col3  1  0  1

你可以试试：

library(tidyverse)
df %>%
  gather(key, value, -id) %>%
  group_by(key, value) %>%
  count %>%
  filter(!is.na(value))
# A tibble: 4 x 3
# Groups:   key, value [4]
  key   value     n
  <chr> <chr> <int>
1 col1  a         1
2 col1  b         1
3 col2  b         2
4 col3  a         1

如果您希望将表格结果编辑到您的问题中，您可以执行以下操作：

df %>%
  gather(key, value, -id) %>%
  group_by(key, value) %>%
  count %>%
  filter(!is.na(value)) %>%
  group_by(key) %>%
  mutate(x = sum(n)) %>%
  spread(value, n, fill = 0)
# A tibble: 3 x 4
# Groups:   key [3]
  key       x     a     b
  <chr> <int> <dbl> <dbl>
1 col1      2     1     1
2 col2      2     0     2
3 col3      1     1     0

一种tidyverse可能性可能是：

 df %>%
  gather(var, letters, -id, na.rm = TRUE) %>%
  add_count(var, letters, name = "n_letters") %>%
  add_count(var, name = "n_all") %>%
  select(-id) %>%
  distinct()
  var   letters n_letters n_all
  <chr> <chr>       <int> <int>
1 col1  b               1     2
2 col1  a               1     2
3 col2  b               2     2
4 col3  a               1     1

或：

df %>%
 gather(var, letters, -id, na.rm = TRUE) %>%
 add_count(var, letters, name = "n_letters") %>%
 add_count(var, name = "all") %>%
 select(-id) %>%
 distinct() %>%
 spread(letters, n_letters, fill = 0)
  var   all     a     b
  <chr> <int> <dbl> <dbl>
1 col1      2     1     1
2 col2      2     0     2
3 col3      1     1     0

r语言 - 'dplyr's 'summarize_all()' by some condition

相关内容

最新更新

热门标签：