R代码以百分比形式统计所选列的出现次数

  • 本文关键字:代码 百分比 统计 r csv
  • 更新时间 :
  • 英文 :

我希望你今天过得愉快。我是R代码的新手我有一个CSV文件,它由10000行组成,不断变化,我将显示包括标题在内的前11行。我需要对某些列中的数据(百分比(进行计数,如位置、组和可用。
Number  Group or Place   Location    Available   Size
1         A              Los Angeles  YES       10
2         A              New York     NO        11 
3         A              NA           YES       12
4         A              Chicago      NO        13 
5         B              Los Angeles  NA        14
6         B              Houston      YES       15
7         B              Houston      NO        16
8         C              Chicago      YES       17
9         C              Chicago      YES       18
10        D              New York     NO        19

我想创建一个输出并显示在excel文件中。基本上,我想选择"组"、"位置"one_answers"可用"等列,并显示每个唯一值的百分比,包括缺失值,以及它出现的时间占列总数的百分比。NA指的是空值。以下是Excel中输出的样子

Column Name      Values        Percentage(occurrence%)
Group or Place        A             40 (40%)
B             30 (30%)
C             20 (20%)
D             10 (10%)
Location     Los Angeles   20 (20%)
Chicago       30 (30%)
NA            10 (10%)
Houston       20 (20%)
New York      20 (20%)
Available    YES           50 (50%)
NO            40 (40%)
NA            10 (10%)

这是我的R代码。我对每个值的数量进行计数,但NA不包括在内。我无法弄清楚如何获得每个唯一值(包括缺失值(占列总数的百分比,并导出到excel文件。

exceldata <- read.csv("group.csv") 
view(exceldata) # view the whole csv
str(exceldata)  #str() function we can see that we have many observations across 5 columns
table(exceldata['Group or Place'])  #show the number of each value and the number of time it occurred under group column 
table(exceldate['Location']) #show the number of each value and the number of time it occurred under Location column
table(exceldata$Group or Place)/length(exceldata$Group or Place) # It shows the value similar to percentage.

这样的东西怎么样:

dat <- tibble::tribble(
~Number,    ~Group,    ~Location,   ~Available,   ~Size,
1 ,       "A",        "Los Angeles",  "YES",      10,
2 ,       "A",        "New York",     "NO",        11, 
3 ,       "A",        NA_character_,           "YES",      12,
4 ,       "A",        "Chicago",      "NO",        13, 
5 ,       "B",        "Los Angeles",  NA_character_,      14,
6 ,       "B",        "Houston",      "YES",       15,
7 ,       "B",        "Houston",      "NO",      16,
8 ,       "C",        "Chicago",      "YES",       17,
9 ,       "C",        "Chicago",      "YES",      18,
10,       "D",        "New York",     "NO",        19)

library(tidyverse)
dat %>% 
pivot_longer(Group:Available, names_to = "var", values_to="cat") %>% 
group_by(var, cat) %>% 
summarise(Size = sum(Size)) %>% 
ungroup %>% 
group_by(var) %>% 
mutate(Percent = Size/sum(Size), 
Percent = sprintf("%.0f (%.0f%%)", Percent*100, Percent*100))
# # A tibble: 12 × 4
# # Groups:   var [3]
#   var       cat          Size Percent 
#   <chr>     <chr>       <dbl> <chr>   
# 1 Available NO             59 41 (41%)
# 2 Available YES            72 50 (50%)
# 3 Available NA             14 10 (10%)
# 4 Group     A              46 32 (32%)
# 5 Group     B              45 31 (31%)
# 6 Group     C              35 24 (24%)
# 7 Group     D              19 13 (13%)
# 8 Location  Chicago        48 33 (33%)
# 9 Location  Houston        31 21 (21%)
# 10 Location  Los Angeles    24 17 (17%)
# 11 Location  New York       30 21 (21%)
# 12 Location  NA             12 8 (8%) 

这假设您要将Size相加以确定百分比。

相关内容

最新更新