r-为什么我不得到由其他分类变量分组的两个数字列上的计数,只使用tidyverse



我试图对两个数值变量进行计数,但没有成功。如果没有这一点,我就无法获得百分比,我希望在你的帮助下我能够获得百分比。我想做这件事只是为了好玩。

这是我得到的错误,提供的代码:

test_sum <- test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>% 
dplyr::summarise(across(where(is.numeric())))

Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
Run `rlang::last_error()` to see where the error occurred.

我尝试了另一个代码:

test_sum <- test_data_3 %>%
dplyr::group_by(provider_name, type, st_nst) %>% 
dplyr::summarise(across(where(is.numeric())))
Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
ℹ The error occurred in group 1: provider_name = "BLACKB", type = "stri", st_nst = "NST".

这是堆栈溢出源代码,我曾尝试过以前的代码:按多列分组并将其他多列相加

这就是我拥有的数据类型:

dput(test_data_3)
structure(list(financial_year = c(1920, 1920, 1920, 1920, 1920, 
1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 
1920, 1920, 1920, 1920), provider_name = c("LIVEW", "MANCHE", 
"MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", 
"SOUTH", "LANCA", "COUNTY", "BUCKINGT", "BLACKB", "BURNLEY", 
"ROYAL", "THE", "LOUTH", "IMPERIAL", "WESTERN"), type = c("non_stringent", 
"non_stringent", "non_stringent", "non_stringent", "non_stringent", 
"non_stringent", "non_stringent", "non_stringent", "non_stringent", 
"non_stringent", "stri", "stri", "stri", "stri", "stri", "stri", 
"stri", "stri", "stri", "stri"), eld = c(0, 326, 343, 43, 61, 
46, 1, 3, 3, 1, 313, 671, 329, 389, 3, 376, 306, 0, 409, 589), 
ed = c(1, 23, 23, 0, 2, 0, 1, 0, 0, 0, 7, 3, 4, 4, 0, 0, 
2, 1, 3, 1), st_nst = c("ST", "STI", "ST", "ST", "ST", "ST", 
"ST", "ST", "ST", "ST", "NST", "NST", "NSt", "NST", "NST", 
"NST", "NST", "NST", "NST", "NST")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
cols = list(financial_year = structure(list(), class = c("collector_double", 
"collector")), trust_code = structure(list(), class = c("collector_character", 
"collector")), provider_name = structure(list(), class = c("collector_character", 
"collector")), prim_diag = structure(list(), class = c("collector_character", 
"collector")), type = structure(list(), class = c("collector_character", 
"collector")), elective_discharge = structure(list(), class = c("collector_double", 
"collector")), emergency_admission = structure(list(), class = c("collector_double", 
"collector")), st_nst = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

或者另一种可视化方式是这样的:

test_data_3
# A tibble: 20 x 6
financial_year provider_name type            eld    ed st_nst
<dbl> <chr>         <chr>         <dbl> <dbl> <chr> 
1           1920 LIVEW         non_stringent     0     1 ST    
2           1920 MANCHE        non_stringent   326    23 STI   
3           1920 MANCHE        non_stringent   343    23 ST    
4           1920 MANCHE        non_stringent    43     0 ST    
5           1920 MANCHE        non_stringent    61     2 ST    
6           1920 MANCHE        non_stringent    46     0 ST    
7           1920 MANCHE        non_stringent     1     1 ST    
8           1920 MANCHE        non_stringent     3     0 ST    
9           1920 MANCHE        non_stringent     3     0 ST    
10           1920 SOUTH         non_stringent     1     0 ST    
11           1920 LANCA         stri            313     7 NST   
12           1920 COUNTY        stri            671     3 NST   
13           1920 BUCKINGT      stri            329     4 NSt   
14           1920 BLACKB        stri            389     4 NST   
15           1920 BURNLEY       stri              3     0 NST   
16           1920 ROYAL         stri            376     0 NST   
17           1920 THE           stri            306     2 NST   
18           1920 LOUTH         stri              0     1 NST   
19           1920 IMPERIAL      stri            409     3 NST   
20           1920 WESTERN       stri            589     1 NST   

有人能解释我犯的错误吗?有没有一种方法可以先实现计数,然后实现两个数字列的百分比,即按provider_name, type, st_nst分组的eld & ed。我的意思是,将这两列添加到一个新的列中,该列基于变量分组。

没有函数传递到across。如果意图是select

library(dplyr)
test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>% 
dplyr::select(where(is.numeric))

假设,我们想要得到那些numeric列的sum

test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>% 
dplyr::summarise(across(where(is.numeric), sum))

更新

如果我们想从数据中获得每行数字列的总和,selectnumeric列(where(is.numeric)((cur_data()-更正确,因为它也可以在有组属性或使用.时工作(,使用rowSums获得行和

test_data_3 %>% 
mutate(count = select(cur_data(), where(is.numeric)) %>% 
rowSums(na.rm = TRUE))

最新更新