我使用rvest软件包从网页上刮了一张大表格,但是它将其读为单个向量:
foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
我需要作为看起来像这样的数据框架:
bar<-as.data.frame(cbind(Animal=c("Dog","Cat","Goat"),A=c(1,4,7),B=c(2,5,8),C=c(3,6,9)))
这可能是一个简单的困境,但我会感谢您的帮助。
您可以从矢量创建一个矩阵并将其转换为数据框架:
foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal" , foo)
m <- matrix(foo , ncol = 4 , byrow = TRUE)
df <- as.data.frame(m[-1,] , stringsAsFactors = FALSE)
colnames(df) <- m[1,]
# I assume you want numerics for your A,B,C columns:
df[,2:4]<-apply(df[,2:4],2,as.numeric)
lapply(df,class)
$Animal
[1] "character"
$A
[1] "numeric"
$B
[1] "numeric"
$C
[1] "numeric"
只需 split
即可成排的数量和 rbind
。我在foo
的开头添加了"Animal"
,以使每行的元素在分裂
foo = c("Animal", foo)
df = data.frame(do.call(rbind, split(foo, ceiling(seq_along(foo)/4))),
stringsAsFactors = FALSE)
colnames(df) = df[1,]
df = df[-1,]
df
# Animal A B C
#2 Dog 1 2 3
#3 Cat 4 5 6
#4 Goat 7 8 9
如果您想要适当的列类型,则可以尝试。拆分为列表,命名列表,然后将列类型转换为胁迫到数据框架。
l <- setNames(split(tail(foo, -3), rep(1:4, 3)), c("Animal", foo[1:3]))
as.data.frame(lapply(l, type.convert)) ## stringsAsFactors=FALSE if desired
# Animal A B C
# 1 Dog 1 2 3
# 2 Cat 4 5 6
# 3 Goat 7 8 9
这是一个方便使用列表的工具,
seqList <-
function(character,by= 1,res=list()){
### sequence characters by
if (length(character)==0){
res
} else{
seqList(character[-c(1:by)],by=by,res=c(res,list(character[1:by])))
}
}
将角色转换为列表,例如操纵它们可以做到的更容易。
options(stringsAsFactors=FALSE)
foo <-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal",foo)
df <- data.frame(t(do.call("rbind",
lapply(1:4,function(x) do.call("cbind",lapply(seqList(foo,4),"[[",x))))))
colnames(df) <- df[1,]
df <- df[-1,]
## > df
## Animal A B C
## 2 Dog 1 2 3
## 3 Cat 4 5 6
## 4 Goat 7 8 9
注意:我尚未测试该功能的效率。对于大量字符,它可能不是很有效。矩阵的使用可能是该工作的更好工具。