我已经使用 bigrquery 和 dbGetQuery 函数执行了一个 JOIN。现在我想查询我创建的临时表,但无法连接



我担心,如果一群人开始运行我的实际代码,我将被收取查询费用,所以我的示例代码是针对虚假数据库的。

我已经成功地建立了与BigQuery:的连接

con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)

然后使用聚结函数执行LEFT JOIN:

dbGetQuery(con,
"SELECT 
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")

JOIN生成了我想要的表,现在我想查询该表,但喜欢。。。它在哪里?如何连接?我可以将其保存在本地,以便开始在R中进行分析吗?即使我转到BigQuery,选择Project History选项卡,选择我刚刚在RStudio中运行的查询,并复制临时表的Job ID,我仍然会收到以下错误:

Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.

如果我跟进:

Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn@quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'nWHERE Type_1 IS NULL;")
2. -DBI::dbGetQuery(con, "SELECT *nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'nWHERE Type_1 IS NULL;")
3.   -DBI:::.local(conn, statement, ...)
4.     +-DBI::dbSendQuery(conn, statement, ...)
5.     -bigrquery::dbSendQuery(conn, statement, ...)
6.       -bigrquery:::BigQueryResult(conn, statement, ...)
7.         -bigrquery::bq_job_wait(job, quiet = conn@quiet)

有人能解释一下吗?只是我不能用bigrquery包查询临时表吗?

从这里和这里的文档来看,问题可能只是您没有将结果分配到任何地方。

local_df = dbGetQuery(...

应该将数据库查询的结果复制到本地R内存中。请注意,因为没有检查结果的大小,所以在执行此操作时很容易耗尽内存。


您已经用dbplyr标记了这个问题,但看起来您只是在使用DBI包。如果您想编写R并将其翻译成SQL,那么您可以使用dbplyr来完成这项工作。它看起来像这样:

con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)

当您使用这种方法时,new_remote_tbl可以被认为是数据库中的一个新表,您可以进一步查询和操作它。(它实际上不是一个表——没有数据保存到光盘上——但你可以查询它并与它交互,就像它是一样,数据库会根据需要为你生成它(。

使用远程表有一些限制(最大的限制是您只能使用dbply可以转换为SQL的命令(。如果要将当前远程表复制到本地R内存中,请使用collect:

local_df = remote_df %>%
collect()

最新更新