r语言 - 使用答案键替换基于计算机的测试结果数据集的值



我的数据集取自基于计算机的测试,下面给出了一个样本。

x<-data.frame(rbind(c("A","C","A","B","A"),
c("M","M","M","M","M"),
c("M","M","M","M","M"),
c("C","C","A","C","A"),
c("C","C","B","C","A"),
c("A","C","A","C","B")))
colnames(x)<-c("q1","q2","q3","q4","q5")
rownames(x)<-c("key","c1","c2","c3","c4","c5")
q1 q2 q3 q4 q5
key  A  C  A  B  A
c1   M  M  M  M  M
c2   M  M  M  M  M
c3   C  C  A  C  A
c4   C  C  B  C  A
c5   A  C  A  C  B

列表示问题,行表示候选人。 第一行是答案键。M 代表 未回答。 我需要替换值,以便将 Ms替换为"NA",将正确答案替换为 1,将错误答案替换为 0。 DXO对于 q1,正确答案是"A",因此候选 3 的值 "C" 具有 替换为 0,因为答案是错误的。

最终数据集应如下所示

q1   q2   q3   q4   q5
key    A    C    A    B    A
c1  <NA> <NA> <NA> <NA> <NA>
c2  <NA> <NA> <NA> <NA> <NA>
c3     0    1    1    0    1
c4     0    1    0    0    1
c5     1    1    1    0    0

重新定位 M 相当简单。

x[x=="M"]<-NA 

但我发现很难一步替换其他值。

x<-as.matrix(x) 

由于数据帧引发错误而转换为矩阵 "Ops.factor(左、右)中的错误:因子的级别集不同">

for(i in 2:nrow(x)){
for( j in 1:ncol(x))
{
ifelse(x[i][j]==x[1][j],x[i][j]<-1,x[i][j]<-0)
}}

此 for 循环仅替换第一列的值。

q1  q2  q3  q4  q5 
key "A" "C" "A" "B" "A"
c1  NA  NA  NA  NA  NA 
c2  NA  NA  NA  NA  NA 
c3  "0" "C" "A" "C" "A"
c4  "0" "C" "B" "C" "A"
c5  "1" "C" "A" "C" "B"

如何替换整个数据集?

不应将键作为观察(行)包含在数据结构中。从概念上讲,它不属于那里。您还应该使用矩阵而不是 data.frame。

x <- as.matrix(x)
key <- x[1,]
x <- x[-1,]
x[x == "M"] <- NA
#matrices are filled by column, 
#thus we need to transpose
#unary plus turns the logical matrix into an integer matrix
y <- +(t(t(x) == key))
#   q1 q2 q3 q4 q5
#c1 NA NA NA NA NA
#c2 NA NA NA NA NA
#c3  0  1  1  0  1
#c4  0  1  0  0  1
#c5  1  1  1  0  0

请注意,我更正了数据中的拼写错误。

使用 dplyr 改变所有列:

library(dplyr)
# after the NA inputation step
x %>%
mutate_all(funs(ifelse(row_number(.) == 1, 
as.character(.), # leave first row unchanged
as.numeric(toupper(.) == first(.))))) #compare subsequent rows with first
q1   q2   q3   q4   q5
1    A    C    A    B    A
2 <NA> <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA> <NA>
4    0    1    1    0    1
5    0    1    0    0    1
6    1    1    1    0    0

(注意:示例数据包括大写和小写的答案,因此我假设计算机允许这两个输入。如果不是这种情况并且所有答案都是大写的,则可以跳过toupper()部分。

使用 ifelse 函数,您可以执行以下操作:

#When working with character data, take note of this option stringsAsFactors=FALSE
# Candidate c4 data has lower key C, corrected it below

x = data.frame(rbind(c("A","C","A","B","A"),
c("M","M","M","M","M"),
c("M","M","M","M","M"),
c("C","C","A","C","A"),
c("c","c","B","C","A"),
c("A","C","A","C","B")),stringsAsFactors=FALSE)

#all upper case                 
x = sapply(x,toupper)   
colnames(x) = c("q1","q2","q3","q4","q5")
rownames(x) = c("key","c1","c2","c3","c4","c5")
#replace M's
x[x == "M"] = NA

#Match each row with key vector x[1,], repeated 5 time to match number of rows of original dataset

x[-1,] = ifelse(x[-1,] == matrix(rep(as.matrix(x[1,]),5),nrow=5,byrow=TRUE),1,0)
x
#    q1  q2  q3  q4  q5 
#key "A" "C" "A" "B" "A"
#c1  NA  NA  NA  NA  NA 
#c2  NA  NA  NA  NA  NA 
#c3  "0" "1" "1" "0" "1"
#c4  "0" "1" "0" "0" "1"
#c5  "1" "1" "1" "0" "0"                

最新更新