我有一个调查结果的R数据帧。每列都是对调查中一个问题的回答。它可以采用值 1 到 10 和 NA。我想把它变成一个频率表。
这是我拥有的数据的一个例子。我假装值从 1 到 3,而不是 1 到 10。
data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3")
)
我想要什么:
data.frame(
"Question" = c("Question1", "Question2"),
"Frequency of 1" = c(2, 1),
"Frequency of 2" = c(0 , 1),
"Frequency of 3" = c(0, 1)
)
我尝试使用likert软件包中的likert((,但是我得到的分数结果不正确。这个问题有简单的解决方案吗?
这是一个使用 dplyr 和 purrr 包的解决方案
library(dplyr)
library(purrr)
data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3")
)
df %>%
select(-Person) %>%
mutate_all(~ factor(.x, levels = as.character(1:10) ) %>% addNA() ) %>%
map(table) %>%
transpose() %>%
map(as.integer) %>%
set_names( ~ paste0("Frequency of ",ifelse(is.na(.), "NA", .))) %>%
as_tibble() %>%
mutate(Question = setdiff(names(df),"Person")) %>%
select(Question,everything(), "Frequency of NA" = `Frequency of ` )
data.table
解决方案:
require(data.table)
setDT(df)
# Melt data:
df <- melt(df, id.vars = "Person", value.name = "Question")
# Cast data to required structure:
df <- data.frame(dcast(df, variable ~ Question))
# Rename variables and remove NA count (as per Ops question):
names(df)[1] <- "Question"
names(df)[-1] <- gsub("X", "Frequency of ", names(df)[-1])
df$NA. <- NULL
df
# Question Frequency of 1 Frequency of 2 Frequency of 3
#1 Question1 2 0 0
#2 Question2 1 1 1
或者一行回答:
dcast(melt(setDT(df), id.vars="Person", value.name="Question")[!Question %in% NA][, Question := paste0("Frequency of ", Question)], variable ~ Question)
tidyverse
可能性可能是:
df %>%
gather(Question, val, -Person, na.rm = TRUE) %>%
group_by(Question, val) %>%
summarise(res = length(val)) %>%
ungroup() %>%
mutate(val = paste0("Frequency.of.", val)) %>%
spread(val, res, fill = NA)
Question Frequency.of.1 Frequency.of.2 Frequency.of.3
<chr> <int> <int> <int>
1 Question1 2 NA NA
2 Question2 1 1 1
在这里,它首先将数据从宽格式转换为长格式。其次,它根据问题计算频率。最后,它创建"Frequency.of."变量并将数据返回到所需的形状。
或者,如果您还想计算每个问题的 NA 值:
df %>%
gather(Question, val, -Person) %>%
group_by(Question, val) %>%
summarise(res = length(val)) %>%
ungroup() %>%
mutate(val = paste0("Frequency.of.", val)) %>%
spread(val, res, fill = NA)
Question Frequency.of.1 Frequency.of.2 Frequency.of.3 Frequency.of.NA
<chr> <int> <int> <int> <int>
1 Question1 2 NA NA 1
2 Question2 1 1 1 NA
这不是最优雅的,但可能会有所帮助:df2 是你的数据集。数据:
df2<-data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3"),stringsAsFactors = F
)
目标:编辑::您可以按如下方式"自动化"
df2[is.na(df2)]<-0 #To allow numeric manipulation
values<-c("1","2","3")
Final_df<-sapply(values,function(val) apply(df2[,-1],2,function(x) sum(x==val)))
Final_df<-as.data.frame(Final_df)
names(Final_df)<-paste0("Frequency of_",1:ncol(Final_df))
这会产生:
Frequency of_1 Frequency of_2 Frequency of_3
Question1 2 0 0
Question2 1 1 1