我希望你今天过得愉快。我是R代码的新手我有一个CSV文件,它由10000行组成,不断变化,我将显示包括标题在内的前11行。我需要对某些列中的数据(百分比(进行计数,如位置、组和可用。
Number Group or Place Location Available Size
1 A Los Angeles YES 10
2 A New York NO 11
3 A NA YES 12
4 A Chicago NO 13
5 B Los Angeles NA 14
6 B Houston YES 15
7 B Houston NO 16
8 C Chicago YES 17
9 C Chicago YES 18
10 D New York NO 19
我想创建一个输出并显示在excel文件中。基本上,我想选择"组"、"位置"one_answers"可用"等列,并显示每个唯一值的百分比,包括缺失值,以及它出现的时间占列总数的百分比。NA指的是空值。以下是Excel中输出的样子
Column Name Values Percentage(occurrence%)
Group or Place A 40 (40%)
B 30 (30%)
C 20 (20%)
D 10 (10%)
Location Los Angeles 20 (20%)
Chicago 30 (30%)
NA 10 (10%)
Houston 20 (20%)
New York 20 (20%)
Available YES 50 (50%)
NO 40 (40%)
NA 10 (10%)
这是我的R代码。我对每个值的数量进行计数,但NA不包括在内。我无法弄清楚如何获得每个唯一值(包括缺失值(占列总数的百分比,并导出到excel文件。
exceldata <- read.csv("group.csv")
view(exceldata) # view the whole csv
str(exceldata) #str() function we can see that we have many observations across 5 columns
table(exceldata['Group or Place']) #show the number of each value and the number of time it occurred under group column
table(exceldate['Location']) #show the number of each value and the number of time it occurred under Location column
table(exceldata$Group or Place)/length(exceldata$Group or Place) # It shows the value similar to percentage.
这样的东西怎么样:
dat <- tibble::tribble(
~Number, ~Group, ~Location, ~Available, ~Size,
1 , "A", "Los Angeles", "YES", 10,
2 , "A", "New York", "NO", 11,
3 , "A", NA_character_, "YES", 12,
4 , "A", "Chicago", "NO", 13,
5 , "B", "Los Angeles", NA_character_, 14,
6 , "B", "Houston", "YES", 15,
7 , "B", "Houston", "NO", 16,
8 , "C", "Chicago", "YES", 17,
9 , "C", "Chicago", "YES", 18,
10, "D", "New York", "NO", 19)
library(tidyverse)
dat %>%
pivot_longer(Group:Available, names_to = "var", values_to="cat") %>%
group_by(var, cat) %>%
summarise(Size = sum(Size)) %>%
ungroup %>%
group_by(var) %>%
mutate(Percent = Size/sum(Size),
Percent = sprintf("%.0f (%.0f%%)", Percent*100, Percent*100))
# # A tibble: 12 × 4
# # Groups: var [3]
# var cat Size Percent
# <chr> <chr> <dbl> <chr>
# 1 Available NO 59 41 (41%)
# 2 Available YES 72 50 (50%)
# 3 Available NA 14 10 (10%)
# 4 Group A 46 32 (32%)
# 5 Group B 45 31 (31%)
# 6 Group C 35 24 (24%)
# 7 Group D 19 13 (13%)
# 8 Location Chicago 48 33 (33%)
# 9 Location Houston 31 21 (21%)
# 10 Location Los Angeles 24 17 (17%)
# 11 Location New York 30 21 (21%)
# 12 Location NA 12 8 (8%)
这假设您要将Size
相加以确定百分比。