从网页上刮下来的R-表被读为单个字符向量:如何转换为数据框架



我使用rvest软件包从网页上刮了一张大表格,但是它将其读为单个向量:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")

我需要作为看起来像这样的数据框架:

bar<-as.data.frame(cbind(Animal=c("Dog","Cat","Goat"),A=c(1,4,7),B=c(2,5,8),C=c(3,6,9)))

这可能是一个简单的困境,但我会感谢您的帮助。

您可以从矢量创建一个矩阵并将其转换为数据框架:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal" , foo)
m <- matrix(foo , ncol = 4  , byrow = TRUE)
df <- as.data.frame(m[-1,] , stringsAsFactors = FALSE)  
colnames(df) <- m[1,]
# I assume you want numerics for your A,B,C columns:
df[,2:4]<-apply(df[,2:4],2,as.numeric)
lapply(df,class)
$Animal
[1] "character"
$A
[1] "numeric"
$B
[1] "numeric"
$C
[1] "numeric"

只需 split即可成排的数量和 rbind。我在foo的开头添加了"Animal",以使每行的元素在分裂

时使元素相等
foo = c("Animal", foo)
df = data.frame(do.call(rbind, split(foo, ceiling(seq_along(foo)/4))),
                                                      stringsAsFactors = FALSE)
colnames(df) = df[1,]
df = df[-1,]
df
#  Animal A B C
#2    Dog 1 2 3
#3    Cat 4 5 6
#4   Goat 7 8 9

如果您想要适当的列类型,则可以尝试。拆分为列表,命名列表,然后将列类型转换为胁迫到数据框架。

l <- setNames(split(tail(foo, -3), rep(1:4, 3)), c("Animal", foo[1:3]))
as.data.frame(lapply(l, type.convert))  ## stringsAsFactors=FALSE if desired
#    Animal A B C
# 1     Dog 1 2 3
# 2     Cat 4 5 6
# 3    Goat 7 8 9

这是一个方便使用列表的工具,

 seqList <-
function(character,by= 1,res=list()){
    ### sequence characters by 
    if (length(character)==0){
        res
    } else{
        seqList(character[-c(1:by)],by=by,res=c(res,list(character[1:by])))
    }
    }

将角色转换为列表,例如操纵它们可以做到的更容易。

options(stringsAsFactors=FALSE)
foo <-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal",foo)
df <- data.frame(t(do.call("rbind",
    lapply(1:4,function(x) do.call("cbind",lapply(seqList(foo,4),"[[",x))))))
colnames(df) <- df[1,]
df <- df[-1,]
## > df
##   Animal A B C
## 2    Dog 1 2 3
## 3    Cat 4 5 6
## 4   Goat 7 8 9

注意:我尚未测试该功能的效率。对于大量字符,它可能不是很有效。矩阵的使用可能是该工作的更好工具。

最新更新