我有一个数据框,其中包含一个ID列和多个列,我想汇总。在每一列(互斥(中,我想计算与"a"、"b"或两者之一匹配的行。
> df
# A tibble: 5 x 3
id col1 col2 col3
<dbl> <chr> <chr> <chr>
1 1 NA b NA
2 2 NA b NA
3 3 NA NA a
4 4 b NA NA
5 5 a NA NA
据我所知:
> df %>%
group_by(id) %>%
summarize_all(a = nrow(. %>% filter(. == "a"),
b = nrow(. %>% filter(. == "b"),
x = nrow(!is.na(.))
Error: Can't create call to non-callable object
Call `rlang::last_error()` to see a backtrace
我采取了正确的方法吗?我试图得到看起来像这样的东西:
var a b x
-------------
col1 1 1 2
col2 0 2 2
col3 1 0 1
你可以试试:
library(tidyverse)
df %>%
gather(key, value, -id) %>%
group_by(key, value) %>%
count %>%
filter(!is.na(value))
# A tibble: 4 x 3
# Groups: key, value [4]
key value n
<chr> <chr> <int>
1 col1 a 1
2 col1 b 1
3 col2 b 2
4 col3 a 1
如果您希望将表格结果编辑到您的问题中,您可以执行以下操作:
df %>%
gather(key, value, -id) %>%
group_by(key, value) %>%
count %>%
filter(!is.na(value)) %>%
group_by(key) %>%
mutate(x = sum(n)) %>%
spread(value, n, fill = 0)
# A tibble: 3 x 4
# Groups: key [3]
key x a b
<chr> <int> <dbl> <dbl>
1 col1 2 1 1
2 col2 2 0 2
3 col3 1 1 0
一种tidyverse
可能性可能是:
df %>%
gather(var, letters, -id, na.rm = TRUE) %>%
add_count(var, letters, name = "n_letters") %>%
add_count(var, name = "n_all") %>%
select(-id) %>%
distinct()
var letters n_letters n_all
<chr> <chr> <int> <int>
1 col1 b 1 2
2 col1 a 1 2
3 col2 b 2 2
4 col3 a 1 1
或:
df %>%
gather(var, letters, -id, na.rm = TRUE) %>%
add_count(var, letters, name = "n_letters") %>%
add_count(var, name = "all") %>%
select(-id) %>%
distinct() %>%
spread(letters, n_letters, fill = 0)
var all a b
<chr> <int> <dbl> <dbl>
1 col1 2 1 1
2 col2 2 0 2
3 col3 1 1 0