我正试图为发布创建一个不符合"整洁"输出的表:
dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
categorical_2 = c(rep("one", 5), rep("two", 5)),
numeric = sample(1:10, 10))
dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%
mutate(prop = prop.table(n))
Tidyverse输出
categorical_1 categorical_2 n prop
<fct> <fct> <int> <dbl>
1 a one 3 0.6
2 a two 2 0.4
3 b one 2 0.4
4 b two 3 0.6
所需输出:
Category One Two
a 3 (0.6) 2 (0.4)
b 2 (0.4) 3 (0.6)
也许我可以应用其他mutate
步骤来使表符合我想要的输出?
library(janitor)
dummy %>%
tabyl(categorical_1, categorical_2) %>%
adorn_percentages("row") %>%
adorn_ns(position = "front")
#> categorical_1 one two
#> a 3 (0.6) 2 (0.4)
#> b 2 (0.4) 3 (0.6)
您可以在将n
和prop
组合成一列后使用pivot_wider
library(tidyverse)
d2 %>%
mutate(v = paste0(n, ' (', prop, ')')) %>%
pivot_wider(id_cols = categorical_1, names_from = categorical_2, values_from = v) %>%
rename_at(1, ~'Category')
# # A tibble: 2 x 3
# # Groups: Category [2]
# Category one two
# <fct> <chr> <chr>
# 1 a 3 (0.6) 2 (0.4)
# 2 b 2 (0.4) 3 (0.6)
问题的初始数据
d2 <-
dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%
mutate(prop = prop.table(n))
这与其他答案没有太大区别。我想了解一些可以归结为偏好的东西:
count
使组下降,而summarise
使最后一组脱落;由于您需要在mutate
中再次使用第一组(categorical_1
(,您可以调用group_by
,然后调用summarise
,然后计算您的比例以获得更多控制- 我发现,与使用各种标点符号或其他分隔符调用
paste
相比,使用基于glue
的函数更容易构建这种字符串 - 您想要的输出具有不带数字的标题大小写的列名,所以我在最后的
rename_all
中对此进行了清理
library(dplyr)
library(tidyr)
library(stringr)
dummy %>%
group_by(categorical_1, categorical_2) %>%
summarise(n = n()) %>%
mutate(prop = n / sum(n),
display = str_glue("{n} ({prop})")) %>%
select(-n, -prop) %>%
pivot_wider(names_from = categorical_2, values_from = display) %>%
rename_all(~str_remove(., "_\d+") %>% str_to_title())
#> # A tibble: 2 x 3
#> # Groups: Categorical [2]
#> Categorical One Two
#> <fct> <chr> <chr>
#> 1 a 3 (0.6) 2 (0.4)
#> 2 b 2 (0.4) 3 (0.6)
从您的管道中提取,我们可以unite
n
和prop
以及spread
,即
dummy %>%
count(categorical_1, categorical_2) %>%
group_by(categorical_1) %>%
mutate(prop = prop.table(n)) %>%
unite(n_prop, n, prop) %>%
spread(categorical_2, n_prop)
它给出
# A tibble: 2 x 3 # Groups: categorical_1 [2] categorical_1 one two <fct> <chr> <chr> 1 a 3_0.6 2_0.4 2 b 2_0.4 3_0.6
您可以使用unite
的分隔符,并进行变异以粘贴右括号。如果您需要它,请严格按照
data.table
解决方案:
library(data.table)
dcast(setDT(dummy)[, .(count = .N),
.(categorical_1, categorical_2)],
categorical_1~categorical_2)[,
.(categorical_1 = categorical_1,
one=paste0(one, " (", one/sum(one), ")"),
two=paste0(two, " (", one/sum(two), ")"))]
#> categorical_1 one two
#> 1: a 3 (0.6) 2 (0.6)
#> 2: b 2 (0.4) 3 (0.4)
数据:
dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
categorical_2 = c(rep("one", 5), rep("two", 5)),
numeric = sample(1:10, 10))