r语言 - 以向量作为参数的 RSQLite 参数化查询



我对SQL及其语法不熟悉,我不明白如何使用RSQLite将多个值(例如向量或列表(传递给R参数化查询中的单个参数。

我有一个双表数据库(myTCGA(,数据来自RNASeq数据。第一个(tcga_P(包含某些基因样本的表达值(FPKM(,而(tcgaMeta(包含这些样本的元数据信息。

#tcga_P
FPKM           Sample                 Tissue GeneName
5550 0.0633 TCGA-AB-2803-03A Acute_Myeloid_Leukemia  PLEKHN1
5551 0.2390 TCGA-AB-2805-03A Acute_Myeloid_Leukemia  PLEKHN1
5552 0.0253 TCGA-AB-2806-03A Acute_Myeloid_Leukemia  PLEKHN1
5553 0.0385 TCGA-AB-2807-03A Acute_Myeloid_Leukemia  PLEKHN1
5554 0.0326 TCGA-AB-2808-03A Acute_Myeloid_Leukemia  PLEKHN1
5555 0.2836 TCGA-AB-2810-03A Acute_Myeloid_Leukemia  PLEKHN1
# tcgaMeta (only few columns)
SampleIndex         SampleID        SubjectID Tumor.Type         Sample.Type
1           0 TCGA-01-0628-11A TCGA-01-0628         OV Solid Tissue Normal
2           1 TCGA-01-0630-11A TCGA-01-0630         OV Solid Tissue Normal
3           2 TCGA-01-0631-11A TCGA-01-0631         OV Solid Tissue Normal

我想从属于特定组的样本(例如所有肺部样本(tcga_P中提取表达值。为此,我编写了一个如下所示的查询

library(DBI)
library(RSQLite)
library(data.table)
myGene <- "PLEKHN1"
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
selectedSamples <- dbGetQuery(myTCGA, 
"SELECT A.*
FROM tcga_P A 
WHERE A.GeneName = $gene AND
A.Sample in (SELECT B.SampleID FROM tcgaMeta B 
WHERE B.Tissue = $tissue AND 
B.`Disease.TCGA.` = $cancer )
",param = list(gene=myGene,tissue=myTissue,cancer=myCancer))
# from long to wide
selectedSamplesWide <- dcast(selectedSamples,GeneName~Sample, value.var = "FPKM",fun.aggregate = sum)

仅当我同时提供一个基因时,此查询才会正确返回值,但如果我尝试从多个基因中提取值,则此查询会失败(myGene <- c("PLEKHN1","PSMD12"))

Error in result_bind(res@ptr, params) : 
Parameter 2 does not have length 2.

我可以循环(sapply(载体中的基因,使sql调用一次一个基因并将结果绑定在一起,但我想在sql调用中做所有事情。

到目前为止,我尝试了

WHERE A.GeneName IN ($gene)
WHERE A.GeneName IN (SELECT C.GeneName FROM $gene C)

我还尝试将myGene转换为data.frame,并将基因视为一列。说什么都没有奏效是多余的。

我缺少什么? 如何将参数传递给param = list()

在 SQL 中,相等的WHERE子句=需要一个值,您尝试将表达式计算为两个值。但是,带INWHERE允许多个值:

WHERE A.GeneName IN ('PLEKHN1', 'PSMD12', ...)

对于开放式数量的值,请考虑使用paste+collapse动态创建预准备语句,并使用setNamesas.list绑定参数值:

myGene <- c("PLEKHN1", "PSMD12")
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
myPlaceHolders <- paste0("$gene", seq_along(myGene))
sql <- paste0("SELECT A.*
FROM tcga_P A 
WHERE A.GeneName IN (", paste(myPlaceHolders, collapse=", "), ") 
AND A.Sample in (SELECT B.SampleID 
FROM tcgaMeta B 
WHERE B.Tissue = $tissue 
AND B.`Disease.TCGA.` = $cancer)
")
myGeneParams <- as.list(setNames(myGene, gsub("\$", "", myPlaceHolders)))
paramList <- c(myGeneParams, tissue=myTissue, cancer=myCancer)
selectedSamples <- dbGetQuery(myTCGA, sql, param = myParamList)

Rextester 演示(用于预准备语句和参数输出,而不是查询运行(

最新更新