r-在双向频率表中同时显示Ns和比例

  • 本文关键字:显示 Ns 频率 r dataframe dplyr
  • 更新时间 :
  • 英文 :


我正试图为发布创建一个不符合"整洁"输出的表:

dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
categorical_2 = c(rep("one", 5), rep("two", 5)),
numeric = sample(1:10, 10))
dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%      
mutate(prop = prop.table(n))

Tidyverse输出

categorical_1 categorical_2     n  prop
<fct>         <fct>         <int> <dbl>
1 a             one               3   0.6
2 a             two               2   0.4
3 b             one               2   0.4
4 b             two               3   0.6

所需输出:

Category          One       Two
a                 3 (0.6)     2 (0.4)
b                 2 (0.4)     3 (0.6)

也许我可以应用其他mutate步骤来使表符合我想要的输出?

library(janitor)
dummy %>%
tabyl(categorical_1, categorical_2) %>%
adorn_percentages("row") %>%
adorn_ns(position = "front")
#>  categorical_1     one     two
#>              a 3 (0.6) 2 (0.4)
#>              b 2 (0.4) 3 (0.6)

您可以在将nprop组合成一列后使用pivot_wider

library(tidyverse)
d2 %>% 
mutate(v = paste0(n, ' (', prop, ')')) %>% 
pivot_wider(id_cols = categorical_1, names_from = categorical_2, values_from = v) %>% 
rename_at(1, ~'Category')
# # A tibble: 2 x 3
# # Groups:   Category [2]
#   Category one     two    
#   <fct>    <chr>   <chr>  
# 1 a        3 (0.6) 2 (0.4)
# 2 b        2 (0.4) 3 (0.6)

问题的初始数据

d2 <- 
dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%      
mutate(prop = prop.table(n))

这与其他答案没有太大区别。我想了解一些可以归结为偏好的东西:

  • count使组下降,而summarise使最后一组脱落;由于您需要在mutate中再次使用第一组(categorical_1(,您可以调用group_by,然后调用summarise,然后计算您的比例以获得更多控制
  • 我发现,与使用各种标点符号或其他分隔符调用paste相比,使用基于glue的函数更容易构建这种字符串
  • 您想要的输出具有不带数字的标题大小写的列名,所以我在最后的rename_all中对此进行了清理
library(dplyr)
library(tidyr)
library(stringr)
dummy %>%
group_by(categorical_1, categorical_2) %>%
summarise(n = n()) %>%
mutate(prop = n / sum(n),
display = str_glue("{n} ({prop})")) %>%
select(-n, -prop) %>%
pivot_wider(names_from = categorical_2, values_from = display) %>%
rename_all(~str_remove(., "_\d+") %>% str_to_title())
#> # A tibble: 2 x 3
#> # Groups:   Categorical [2]
#>   Categorical One     Two    
#>   <fct>       <chr>   <chr>  
#> 1 a           3 (0.6) 2 (0.4)
#> 2 b           2 (0.4) 3 (0.6)

从您的管道中提取,我们可以unitenprop以及spread,即

dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%
mutate(prop = prop.table(n))  %>%
unite(n_prop, n, prop) %>% 
spread(categorical_2, n_prop)

它给出

# A tibble: 2 x 3
# Groups:   categorical_1 [2]
categorical_1 one   two  
<fct>         <chr> <chr>
1 a             3_0.6 2_0.4
2 b             2_0.4 3_0.6

您可以使用unite的分隔符,并进行变异以粘贴右括号。如果您需要它,请严格按照

data.table解决方案:

library(data.table)
dcast(setDT(dummy)[, .(count = .N), 
.(categorical_1, categorical_2)], 
categorical_1~categorical_2)[,
.(categorical_1 = categorical_1,
one=paste0(one, " (", one/sum(one), ")"),
two=paste0(two, " (", one/sum(two), ")"))]
#>    categorical_1     one     two
#> 1:             a 3 (0.6) 2 (0.6)
#> 2:             b 2 (0.4) 3 (0.4)

数据:

dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
categorical_2 = c(rep("one", 5), rep("two", 5)),
numeric = sample(1:10, 10))

最新更新