SPIA封装(R)应用于Illumina表达微阵列数据

我一直在尝试GSEA的表达（mRNA）数据的替代注释。SPIA（信号通路整合分析）看起来很有趣，但它似乎只有一个错误消息：

Error in spia(de = sigGenes, all = allGenes, organism = "hsa", plots = TRUE,  :  
de must be a vector of log2 fold changes. The names of de should >be     
included in the reference array!

输入需要log2倍变化的单个向量（我的向量名为sigGenes），将Entrez ID作为关联名称，以及包含在微阵列（allGenes）中的Entrez ID的整数向量：

head(sigGenes)
6144 115286  23530  10776  83933   6232 
0.368  0.301  0.106  0.234 -0.214  0.591 
head(allGenes)
6144 115286  23530  10776  83933   6232

我已经删除了其 EntrezID 注释为 NA 的值。我还使用下面列出的站点中提供的示例，将Illumina微阵列中的数据子集到仅在Affymetrix阵列中发现的那些基因。我仍然收到同样的错误。

以下是 R 代码的完整部分：

library(Biobase)
library(limma)
library(SPIA)
sigGenes <- subset(full_table, P.Value<0.01)$logFC
names(sigGenes) <- subset(full_table, P.Value<0.01)$EntrezID
sigGenes<-sigGenes[!is.na(names(sigGenes))] # remove NAs
allGenes <- unique(full_table$EntrezID[!is.na(full_table$EntrezID)])
spiaOut <- spia(de=sigGenes, all=allGenes, organism="hsa", plots=TRUE, data.dir="./")

有什么想法我可以尝试吗？如果偏离主题，请道歉（这里仍然是新的）。如果需要，很乐意将问题移到其他地方。

应用于

Affymetrix 平台数据的 SPIA 示例：http://www.gettinggeneticsdone.com/2012/03/pathway-analysis-for-high-throughput.html）

删除重复项确实有帮助。作为一种解决方法，我在每组重复项中选择了中值（只是因为值很接近），如下所示：

dups<-unique(names(sigGenes[which(duplicated(names(sigGenes)))])) # determine which are duplicates
dupID<-names(sigGenes) %in% dups # determine the laocation of all duplicates
sigGenes_dup<-vector(); j=0; # determine the median value for each duplicate
for (i in dups){j=j+1; sigGenes_dup[j]<- median(sigGenes[names(sigGenes)==i])  }
names(sigGenes_dup)<-dups
sigGenes<-sigGenes[!(names(sigGenes) %in% dups)] # remove duplicates from sigGenes
sigGenes<-c(sigGenes,sigGenes_dup) # append the median values of the duplicates

或者，只需删除重复项即可：

dups<-unique(names(sigGenes[which(duplicated(names(sigGenes)))]))
sigGenes<-sigGenes[!(names(sigGenes) %in% dups)] # remove duplicates from sigGenes

根据

我们的讨论，我建议删除sigGenes中的重复条目。如果没有其他信息，很难说重复项可能来自哪里，以及要删除哪一个。

相关内容

最新更新

热门标签：