查找R中汇总列的相对频率



我需要获得R中汇总列的相对频率。我使用dplyr的summary来找到每个分组行的总数,如下所示:

data %>%
group_by(x) %>%
summarise(total = sum(dollars))
x                    total 
<chr>                 <dbl>
1 expense 1              3600 
2 expense 2              2150 
3 expense 3              2000 

但现在我需要为每一行的相对频率创建一个新的列,以获得以下结果:

x                   total     p
<chr>                 <dbl>   <dbl>
1 expense 1              3600   46.45%
2 expense 2              2150   27.74%
3 expense 3              2000   25.81%

我试过这个:

data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = scales::percent(total/sum(total))

这个:

data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/sum(total)*100)

但结果总是这样:

x                   total     p
<chr>                 <dbl>   <dbl>
1 expense 1              3600    100%
2 expense 2              2150    100%
3 expense 3              2000    100%

问题似乎是汇总的总列可能会影响结果。有什么想法可以帮我吗?感谢

由于分组,您可以获得100%。然而,在您总结之后,dplyr将放弃一个级别的分组。这意味着,如果你在之后进行mutate(),你会得到你需要的结果:

library(dplyr)
data <- tibble(
x = c("expense 1", "expense 2", "expense 3"),
dollars = c(3600L, 2150L, 2000L)
)

data %>%
group_by(x) %>%
summarise(total = sum(dollars)) %>% 
mutate(p = total/sum(total)*100)

# A tibble: 3 x 3
x         total     p
<chr>     <int> <dbl>
1 expense 1  3600  46.5
2 expense 2  2150  27.7
3 expense 3  2000  25.8

您可以获得100%,因为它计算了特定组的总数。你需要取消分组。假设您想除以总条目,只需除以nrow(df)

data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/nrow(data)*100)

在第一个sum之后,取消分组并使用mutate创建p

iris %>%
group_by(Species) %>%
summarise(total = sum(Sepal.Length)) %>%
ungroup() %>%
mutate(p = total/sum(total)*100)
## A tibble: 3 x 3
#  Species    total     p
#  <fct>      <dbl> <dbl>
#1 setosa      250.  28.6
#2 versicolor  297.  33.9
#3 virginica   329.  37.6

最新更新