我有几个RNA-seq数据集。使用去卷积方法,我对它们中的每一个进行了细胞类型富集分析,然后将结果组合成一个数据帧,产生了1000+个列样本和38个行样本。数据集来自不同的癌症文章。因此,自然地,在使用火山图或t-SNE可视化数据之前,我需要纠正癌症类型AND数据集的批量效应,这样源就不会影响结果。我使用了以下代码:
scores.batch = limma::removeBatchEffect(scores ,metadata$Cancer_Type, metadata$dataset)
然而,对于一些样本中的某些细胞类型,我得到了负分,这当然毫无意义。出了问题。
得分矩阵的dput:
structure(c(0, 0.0853672252935787, 0.0472255148477786, 0.0505467828272972,
0, 0.0308325695761715, 0, 0.157955518619051, 0.00989687292281167,
0.03263377453636, 0.174135667551838, 0.0287360256296349, 0.0647519562755579,
0, 0, 0, 0.0131324709303641, 0, 0, 0.131356081785285, 0, 0.0389487771231123,
0.143102950679691, 0, 0, 0, 0, 0, 0.0120903254374909, 0, 0, 0.0146819273419876,
0.00547214738400891, 0.0128837171879466, 0, 0, 0.0458955588957287,
0, 0, 0.0132395370608289, 0, 0.188105588373935, 0.458389955317805,
0, 0, 0, 0, 0.202322209601319, 0.0070140951370079, 0.0674561160550705,
0.257105522741856, 0.0187125792218268, 0.132650077873857, 0.0464882832616245,
0, 0.267398408455589, 0.257988913719892, 0, 0.0327859369672344,
0.190621289930972, 0.00595393276866058, 0.0257929623669804, 0.0286417150045293,
0.00692628582485207, 0, 0, 0, 0.103067773960521, 0.0254486580186314,
0.0280937010981759, 0.0571003379986667, 0, 0.0129979208251825,
0.0665627159432736, 0, 0, 0.224047712805128, 0, 0.0136182944729644,
0.0432680414524333, 0.0399461338850251, 0.0292693281178669, 0.366507229736257,
0, 0, 0, 0, 0, 0.00669552195301393, 0.0185218739472336, 0.0255519328964942,
0, 0.0733344287554076, 0.0255903177243924, 0, 0.39146057499213,
0.0111881442508292, 0, 0, 0.0511976959994528, 0, 0.00579928556115081,
0.0732688902065305, 0, 0, 0, 0, 0, 0.0110070088566591, 0, 0,
0.0106345827415758, 0, 0.0161657089454384, 0, 0, 0.181452946136114,
0, 0, 0.0142759665417351, 0, 0.0462555010600369, 0.0827203733228943,
0.0145026248884816, 0, 0.013260218865482, 0, 0.0933479614509043,
0.00774602641682057, 0.0119668338387216, 0.129131677414995, 0.0239962329230613,
0.0204322461340539, 0.0493841568846939, 0, 0, 0.121244199785082,
0, 0.0121972859914857, 0.140024857933727, 0.174619321250637,
0.220714394591806, 0.0357916262655448, 0.0545063692410225, 0,
0, 0, 0.137161377602957, 0.00590382216872294, 0.0201750503599633,
0.142034903521219, 0.0985879590151414, 0.0335131516620065, 0.090677099547935,
0.00507741177479126, 0.037356635316015, 0.201168399838889, 0,
0.0314008923657083, 0.365359437170722, 0, 0.0335843135244289,
0.0715582133522154, 0, 0, 0, 0, 0.0474378875432263, 0.0209691496515952,
0.0172794413473455, 0.0135720611847538, 0.033428514409707, 0.0105720693466205,
0, 0, 0, 0.06773450392978, 0, 0, 0.00586743854873324, 0.0168258454272402,
0.0210951521159853, 0.243788369183411, 0.0220752365135898, 0,
0, 0, 0, 0.0306987963385464, 0, 0, 0.0214207906806456, 0, 0.00976336517826329,
0.0271958080474159, 0.00901990828923357, 0.0107550873887369,
0, 0, 0.0447193474549512, 0, 0.0893397248630643, 0.0628755200633895,
0.00689545950391347, 0, 0, 0, 0, 0.0317975383226698, 0.0326506928899438,
0.0585870944941104, 0.0325612902279177, 0.015309108818366, 0.00884806480492375,
0, 0, 0.090018886142378, 0, 0, 0.0252109197429766, 0, 0.0575320783388593,
0.0360786525651136, 0, 0, 0, 0, 0.033985164346289, 0.0224789756565266,
0, 0.0110759152952279, 0.0117488957883667, 0.0308459132319819,
0.0280619366351415, 0, 0.118913468206155, 0.104597268143716,
0, 0, 0.0725014944946794, 0.0178909285824974, 0.107561160668656,
0.103657490882649, 0.00912992981258696, 0, 0, 0, 0.0705798568396863,
0.0358671574380446, 0.0436978038106949, 0.0947966583779633, 0.00754414348305365,
0.0427209871099505, 0.0293558198269896, 0, 0, 0.186905499424745,
0, 0.00921431451770207, 0.0392728923365106, 0.457917600754677,
0.0346030375240686, 0, 0.00559973035365259, 0, 0, 0, 0.0752873255178603,
0, 0, 0.0146401887588438, 0.0149177458753822, 0.0746262560762416,
0.263898927149848, 0.00724132393694323, 0.0356656672388469, 0.408802700748097,
0, 0.105516874044416, 0.0759265575312881, 0, 0, 0, 0, 0, 0, 0,
0.112847654739959, 0.0142421329460783, 0.0261230668401576, 0,
0.00638014939397572, 0.0315337646404048, 0.0165989987896856,
0, 0.359720477023771, 0.119577703639366, 0, 0, 0.00856089558535689,
0, 0.00683428737177429, 0.0668329268575581, 0, 0, 0, 0, 0.047734960135924,
0, 0, 0, 0, 0, 0, 0, 0, 0.107598342312041, 0, 0, 0.0121935326110086,
0, 0, 0.135919574921458, 0, 0, 0, 0, 0, 0.0128474874078213, 0,
0, 0, 0, 0.0153109234303942, 0, 0, 0.158969240948426, 0, 0, 0,
0, 0.0232230943661847, 0.140426779187137, 0, 0, 0, 0, 0.0336919132251274,
0.0340940005017551, 0.00546712364093891, 0.013544098663573, 0.00839243775091744,
0, 0.00548575092813788, 0, 0, 0.0553411208343392, 0, 0, 0.0125206755368664,
0.0182216981318242, 0.121162914437175, 0.114036773041914, 0.0357279266394587,
0, 0, 0, 0.211648365695196, 0, 0.0354172784066678, 0.169262066444321,
0.0630110794062426, 0.0606400985951092, 0.101323746391281, 0,
0.021421960793034, 0.288751473459872, 0, 0.024183015113392, 0.352175638353027,
0, 0.0258095846797072, 0.0228475888849942, 0, 0, 0, 0, 0.0802318746420269,
0, 0, 0.0209026176594105, 0.0167803844651298, 0.0668381647275034,
0.0264858410661633, 0, 0.00902616117758849, 0.10613905468228,
0, 0, 0.0868373941926339), .Dim = c(20L, 20L), .Dimnames = list(
c("Adipocytes", "B-cells", "Basophils", "CD4+ memory T-cells",
"CD4+ naive T-cells", "CD4+ T-cells", "CD4+ Tcm", "CD4+ Tem",
"CD8+ naive T-cells", "CD8+ T-cells", "CD8+ Tcm", "Class-switched memory B-cells",
"DC", "Endothelial cells", "Eosinophils", "Epithelial cells",
"Fibroblasts", "Hepatocytes", "ly Endothelial cells", "Macrophages"
), c("Pt1", "Pt10", "Pt103", "Pt106", "Pt11", "Pt17", "Pt2",
"Pt24", "Pt26", "Pt27", "Pt28", "Pt29", "Pt31", "Pt36", "Pt37",
"Pt38", "Pt39", "Pt4", "Pt46", "Pt47")))
我再次厌倦了只校正Cancer_Type
,但结果并不好(t-SNE没有根据我的需要对数据进行聚类(。
这里可能有什么问题?
Limma假设这两种类型的批处理效果是相加的,因此是独立的。写limma::removeBatchEffect(x = scores, batch = cancer_type, batch2 = study
意味着癌症类型与研究数据集之间没有关系。然而,一项研究很可能是关于一种癌症类型,另一项研究是关于另一种癌症类型。因此,limma模型被打破的假设很可能被打破了。
但是,您可以只创建一个批处理,并将其用作单个批处理参数:
metadata$merged_batch <- paste0(metadata$Cancer_Type, metadata$dataset)
scores.batch = limma::removeBatchEffect(scores, batch = metadata$merged_batch)