我有一个txt文件,它看起来像这样。我需要在R中使用生物RT来获得不同Refseq和肽的完整列表的相应基因ID。除此之外,我还需要保留肽序列和最终结果。我该怎么做?请帮助
myData = read.delim("phosphopeptides.txt", header = FALSE)
使用refseq_peptide匹配我们的ID:
library(biomaRt)
ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")
refseq_peptide = unique(myData$RefSeq)
res <- getBM(attributes = c("refseq_peptide", "hgnc_symbol"),
filters = "refseq_peptide",
values = refseq_peptide,
mart = ensembl)
res
# refseq_peptide hgnc_symbol
# 1 NP_000007 ACADM
# 2 NP_000009 ACADVL
# 3 NP_000012 PSEN1
#merge
merge(myData, res, by.x = "RefSeq", by.y = "refseq_peptide")
# RefSeq Peptide hgnc_symbol
# 1 NP_000007 R.SDPDPKAPANK.A ACADM
# 2 NP_000009 K.SDSHPSDALTR.K ACADVL
# 3 NP_000012 K.YNAESTERESQDTVAENDDGGFSEEWEAQR.D PSEN1
# 4 NP_000012 R.AAVQELSSSILAGEDPEER.G PSEN1
# 5 NP_000012 R.AAVQELSSSILAGEDPEER.G PSEN1
# 6 NP_000012 R.S*LGHPEPLSNGR.P PSEN1
注意:当我们不知道正确的属性名称时,查找属性的有用功能-searchAttributes:
searchAttributes(mart = ensembl, pattern = "refseq")
# name description page
# 86 refseq_mrna RefSeq mRNA ID feature_page
# 87 refseq_mrna_predicted RefSeq mRNA predicted ID feature_page
# 88 refseq_ncrna RefSeq ncRNA ID feature_page
# 89 refseq_ncrna_predicted RefSeq ncRNA predicted ID feature_page
# 90 refseq_peptide RefSeq peptide ID feature_page
# 91 refseq_peptide_predicted RefSeq peptide predicted ID feature_page
searchAttributes(mart = ensembl, pattern = "hgnc")
# name description page
# 64 hgnc_id HGNC ID feature_page
# 65 hgnc_symbol HGNC symbol feature_page
# 95 hgnc_trans_name Transcript name ID feature_page