将S4对象转换为R中的数据帧



我有一个名为"res"的S4对象,它是在使用名为RDAVIDWebService的R包时得到的。我似乎找不到在R.中将此对象转换为数据帧的方法

我尝试使用函数"as.data.frame(res)",但它抛出了以下错误:

> as.data.frame(res)
Error in as.data.frame.default(res) : 
cannot coerce class ‘structure("DAVIDFunctionalAnnotationTable", package = "RDAVIDWebService")’ to a data.frame

对象的结构如下:

> str(res)
Formal class 'DAVIDFunctionalAnnotationTable' [package "RDAVIDWebService"] with 4 slots
..@ Genes     :'data.frame':    3011 obs. of  3 variables:
Formal class 'DAVIDGenes' [package "RDAVIDWebService"] with 5 slots
.. .. ..@ .Data    :List of 3
.. .. .. ..$ : chr [1:3011] "22574630" "3544383" "3544385" "3544382" ...
.. .. .. ..$ : chr [1:3011] "1,2-Dihydroxy-3-keto-5-methylthiopentene dioxygenase, 
putative(LPMP_204190)" "10 kDa heat shock protein(Tc00.1047053508209.100)" "10 kDa heat shock 
protein(Tc00.1047053508209.120)" "10 kDa heat shock protein(Tc00.1047053508209.90)" ...
.. .. .. ..$ : Factor w/ 11 levels "Leishmania braziliensis MHOM/BR/75/M2904",..: 6 10 10 10 
10 10 10 2 6 6 ...
.. .. ..@ names    : chr [1:3] "ID" "Name" "Species"
.. .. ..@ row.names: chr [1:3011] "1" "2" "3" "4" ...
.. .. ..@ .S3Class : chr "data.frame"
.. .. ..@ type     : chr "Gene List Report"
..@ Dictionary:List of 10
.. ..$ COG_ONTOLOGY    :'data.frame':   18 obs. of  2 variables:
.. .. ..$ ID  : chr [1:18] "Translation, ribosomal structure and biogenesis" "Lipid 
metabolism" "Cell division and chromosome partitioning" "General function prediction only" ...
.. .. ..$ Term: chr [1:18] "" "" "" "" ...
.. ..$ GOTERM_BP_DIRECT:'data.frame':   215 obs. of  2 variables:
.. .. ..$ ID  : chr [1:215] "GO:0006457" "GO:0051603" "GO:0008152" "GO:0006412" ...
.. .. ..$ Term: chr [1:215] "protein folding" "proteolysis involved in cellular protein 
catabolic process" "metabolic process" "translation" ...
.. ..$ GOTERM_CC_DIRECT:'data.frame':   84 obs. of  2 variables:
.. .. ..$ ID  : chr [1:84] "GO:0005737" "GO:0016021" "GO:0005634" "GO:0005839" ...
.. .. ..$ Term: chr [1:84] "cytoplasm" "integral component of membrane" "nucleus" "proteasome 
core complex" ...
.. ..$ GOTERM_MF_DIRECT:'data.frame':   222 obs. of  2 variables:
.. .. ..$ ID  : chr [1:222] "GO:0010309" "GO:0018580" "GO:0051213" "GO:0004298" ...
.. .. ..$ Term: chr [1:222] "acireductone dioxygenase [iron(II)-requiring] activity" 
"nitronate monooxygenase activity" "dioxygenase activity" "threonine-type endopeptidase 
activity" ...
.. ..$ INTERPRO        :'data.frame':   695 obs. of  2 variables:
.. .. ..$ ID  : chr [1:695] "IPR004313" "IPR011051" "IPR014710" "IPR011032" ...
.. .. ..$ Term: chr [1:695] "Acireductone dioxygenase ARD family" "RmlC-like cupin domain" 
"RmlC-like jelly roll fold" "GroES-like" ...
.. ..$ KEGG_PATHWAY    :'data.frame':   363 obs. of  2 variables:
.. .. ..$ ID  : chr [1:363] "ldo00071" "ldo00280" "ldo01100" "lmi00280" ...
.. .. ..$ Term: chr [1:363] "Fatty acid degradation" "Valine, leucine and isoleucine 
degradation" "Metabolic pathways" "Valine, leucine and isoleucine degradation" ...
.. ..$ PIR_SUPERFAMILY :'data.frame':   44 obs. of  2 variables:
.. .. ..$ ID  : chr [1:44] "PIRSF000868" "PIRSF002144" "PIRSF002134" "PIRSF002122" ...
.. .. ..$ Term: chr [1:44] "14-3-3 protein" "ribosomal protein, S19p/S19a/S15e/organellar S19 
types" "ribosomal protein, S13p/S13a/S18e/organellar S13 types" "ribosomal protein, 
S7p/S7a/S5e/organellar S7 types" ...
.. ..$ SMART           :'data.frame':   90 obs. of  2 variables:
.. .. ..$ ID  : chr [1:90] "SM00883" "SM00101" "SM01386" "SM01387" ...
.. .. ..$ Term: chr [1:90] "SM00883" "14_3_3" "SM01386" "SM01387" ...
.. ..$ UP_KEYWORDS     :'data.frame':   116 obs. of  2 variables:
.. .. ..$ ID  : chr [1:116] "Coiled coil" "Complete proteome" "Dioxygenase" "Oxidoreductase" 
...
.. .. ..$ Term: chr [1:116] "" "" "" "" ...
.. ..$ UP_SEQ_FEATURE  :'data.frame':   13 obs. of  2 variables:
.. .. ..$ ID  : chr [1:13] "chain:60S ribosomal protein L18" "chain:Probable eukaryotic 
initiation factor 4A" "domain:Helicase ATP-binding" "domain:Helicase C-terminal" ...
.. .. ..$ Term: chr [1:13] "" "" "" "" ...
..@ Membership:List of 10
.. ..$ COG_ONTOLOGY    : logi [1:3011, 1:18] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:18] "Translation, ribosomal structure and biogenesis" "Lipid metabolism" 
"Cell division and chromosome partitioning" "General function prediction only" ...
.. ..$ GOTERM_BP_DIRECT: logi [1:3011, 1:215] FALSE TRUE TRUE TRUE TRUE TRUE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:215] "GO:0006457" "GO:0051603" "GO:0008152" "GO:0006412" ...
.. ..$ GOTERM_CC_DIRECT: logi [1:3011, 1:84] FALSE TRUE TRUE TRUE TRUE TRUE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:84] "GO:0005737" "GO:0016021" "GO:0005634" "GO:0005839" ...
.. ..$ GOTERM_MF_DIRECT: logi [1:3011, 1:222] TRUE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:222] "GO:0010309" "GO:0018580" "GO:0051213" "GO:0004298" ...
.. ..$ INTERPRO        : logi [1:3011, 1:695] TRUE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:695] "IPR004313" "IPR011051" "IPR014710" "IPR011032" ...
.. ..$ KEGG_PATHWAY    : logi [1:3011, 1:363] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:363] "ldo00071" "ldo00280" "ldo01100" "lmi00280" ...
.. ..$ PIR_SUPERFAMILY : logi [1:3011, 1:44] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:44] "PIRSF000868" "PIRSF002144" "PIRSF002134" "PIRSF002122" ...
.. ..$ SMART           : logi [1:3011, 1:90] FALSE TRUE TRUE TRUE TRUE TRUE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:90] "SM00883" "SM00101" "SM01386" "SM01387" ...
.. ..$ UP_KEYWORDS     : logi [1:3011, 1:116] TRUE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:116] "Coiled coil" "Complete proteome" "Dioxygenase" "Oxidoreductase" 
...
.. ..$ UP_SEQ_FEATURE  : logi [1:3011, 1:13] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:13] "chain:60S ribosomal protein L18" "chain:Probable eukaryotic 
initiation factor 4A" "domain:Helicase ATP-binding" "domain:Helicase C-terminal" ...
..@ type      : chr "Functional Annotation Table"

此外,是否有一种通用的方法可以将任何S4对象转换为数据帧,而不关心对象内部的数据?这一点很重要,因为我用这个R包获取的S4对象在4个槽中的每个槽中可能有不同数量的列表/变量/字符(即@Genes、@Dictionary、@Membership和@type)。

也许以下函数会有所帮助。

as.data.frame.DAVIDFunctionalAnnotationTable <- function(x){
Genes <- x@Genes
y <- Genes@.Data
names(y) <- Genes@names
y
}
extractS4_Dictionary <- function(x) x@Dictionary
extractS4_Membership <- function(x) x@Membership
extractS4_type <- function(x) x@type

呼叫

as.data.frame(res)

将强制CCD_ 1为CCD_
其他函数将提取S4对象的成员。


以下函数将提取每个注释的成员身份。

membership <- function(x, which){
y <- as.data.frame(x)
memb <- extractS4_Membership(x)
i <- memb[, which]
y[i, , drop = FALSE]
}
# example usage
membership(res, "COG_ONTOLOGY")

最新更新