我有一个2050行乘202列的大数据帧。我用命令read.spss()
从SPSS中读取数据。这些都是因素变量。
data<-read.spss("filename.sav",to.data.frame=TRUE,reencode='utf-8')
dim(data)
[1] "data.frame"
dim(data)
[1] 2050 202
class(data[1,57])
[1] "factor"
class(data$aq21a) # data$aq21a is 57th column
[1] "NULL"
现在,我想将列57到61(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)
添加到新变量aq21
中,如下所示
aq21<-rbind(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)
但这并不能给出所需的结果。我想要一个10250乘1的矢量
class(aq21)
[1] "matrix"
dim(aq21)
[1] 5 2050
样本数据为
head(data[,57:60])
bq21a bq21b bq21c bq21d
1 Rich / Independent Efficient <NA> <NA>
2 Known / Familiar Efficient <NA> <NA>
3 Relative / Friend Educated / Academic Accountable <NA>
4 Truthfulness Behaviour Good/Great Educated / Academic
5 Behaviour Relative / Friend Educated / Academic Known / Familiar
6 Behaviour Relative / Friend <NA> <NA>
我想要这种类型的结果
bq21a
1 Rich / Independent
2 Known / Familiar
3 Relative / Friend
4 Truthfulness
5 Behaviour
6 Behaviour
7 Efficient
8 Efficient
9 Educated / Academic
10 Behaviour
11 Relative / Friend
... and so on
我得到的结果的样本,这不是所需的
aq21[1:5,1:10]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 24 19 35 22 22 3 22 NA 2
[2,] 6 6 18 22 19 19 31 31 NA 5
[3,] NA NA 9 2 18 NA 26 NA NA 31
[4,] NA NA NA 18 24 NA NA NA NA NA
[5,] NA NA NA 23 NA NA NA NA NA NA
它出了什么问题?我怎样才能得到正确的答案?
假设相关列都是factors
,例如:
data <- structure(list(bq21a = structure(c(4L, 2L, 3L, 5L, 1L, 1L), .Label = c("Behaviour",
"Known / Familiar", "Relative / Friend", "Rich / Independent",
"Truthfulness"), class = "factor"), bq21b = structure(c(3L, 3L,
2L, 1L, 4L, 4L), .Label = c("Behaviour", "Educated / Academic",
"Efficient", "Relative / Friend"), class = "factor"), bq21c = structure(c(NA,
NA, 1L, 3L, 2L, NA), .Label = c("Accountable", "Educated / Academic",
"Good/Great"), class = "factor"), bq21d = structure(c(NA, NA,
NA, 1L, 2L, NA), .Label = c("Educated / Academic", "Known / Familiar"
), class = "factor")), .Names = c("bq21a", "bq21b", "bq21c",
"bq21d"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
as.numeric(data[,1])
#[1] 4 2 3 5 1 1
as.numeric(data[,2])
#[1] 3 3 2 1 4 4
as.numeric(data[,3])
#[1] NA NA 1 3 2 NA
当您执行rbind
时,您将以从factor
转换为numeric
的形式得到结果,如上所示。
rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 4 2 3 5 1 1
#[2,] 3 3 2 1 4 4
#[3,] NA NA 1 3 2 NA
#[4,] NA NA NA 1 2 NA
使用read.csv
或read.table
读取数据时,可以使用stringsAsFactors=FALSE
。如果是这样的话:
dat1 <- data.frame(bq21a=c(rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d)))
head(dat1)
# bq21a
#1 Rich / Independent
#2 Efficient
#3 <NA>
#4 <NA>
#5 Known / Familiar
#6 Efficient
更新
或者试试:
dat2 <- data.frame(bq21a=c(t(data))) #wouldn't matter if the columns are `factors`
#in your dataset the code would be
#dat2 <- data.frame(bq21a= c(t(data[,57:61[)))
identical(dat1, dat2)
#[1] TRUE
根据Ananda和akrun的建议,我使用了以下代码
aq21<-data.frame(Col=unlist(data[,57:61]))
这就解决了问题。但我仍然不明白为什么rbind
不起作用?
rbind
必须使用unlist
才能正常工作吗?