使用Bq命令行命令加载BigQuery外部表时出错



我试图通过Bq命令行使用Bq load命令加载BigQuery外部表。Bq加载命令已执行-bq加载--source_format=NEWLINE_DELITED_JSON{提供的数据集名称}。{provided bq external_table_name}gs://{provided-bucket_name}/{provided_folder_name}/}provided-folder_name}/}/2022007/*我得到的错误是:处理作业"*:bqjob_r6bde3e8976b407bd_ 0000017e4295db78_1"时出错:bq_project_name:bq_dataset_name.bq_external_table_name不允许因为它当前是一个EXTERNAL。任何人都遇到了这个错误,我没有找到任何需要传递的参数来告诉Bq这是谷歌Bq加载文档中的外部表。对此有什么见解吗?我尝试使用external_table=True的GoogleCloudStorageToBigQueryOperator加载外部表,但这也会产生一个错误,表明"'BigQuery作业失败。错误为:{}'.format(错误内容

Exception: BigQuery job failed. Error was: b'{n  "error": {n    "code": 409,n    "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",n    "errors": [n      {n        "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",n        "domain": "global",n        "reason": "duplicate"n      }n    ],n    "status": "ALREADY_EXISTS"n  }n}n
[2022-01-09 17:10:20,995] {base_task_runner.py:113} INFO - Job 230862: Subtask {subtask_name} [2022-01-09 17:10:20,993] {taskinstance.py:1147} ERROR - BigQuery job failed. Error was: b'{n  "error": {n    "code": 409,n    "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",n    "errors": [n      {n        "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",n        "domain": "global",n        "reason": "duplicate"n      }n    ],n    "status": "ALREADY_EXISTS"n  }n}n'
"
this error also threw me off because I created the external table using terraform using below code block
resource google_bigquery_table external_table_name {
project = local.project
dataset_id = google_bigquery_dataset.{provided_dataset_name}.dataset_id
table_id = local.{variable defined for Bq external table}
schema = file("${path.module}/../../../schema/{folder which holds schema json}/schemajsonforexternaltable.json")
depends_on = [google_bigquery_dataset.{provided_dataset_name}]
deletion_protection = false
external_data_configuration {
autodetect = false
source_format = "NEWLINE_DELIMITED_JSON"
source_uris = [
"gs://{bucket_name}-${var.environment}/{folder_name}/{folder_name}/{folder_name}/*"
]
}
}
so why am I doing all this and whats my end goal is I want to retrieve the file name like mentioned in the query below which Google provides an option to the external table as a pseudo column (_FILE_NAME)
SELECT
p_num,
_FILE_NAME AS file_loc /* use this column to know the file name used to build the row in the Bq External table*/
FROM
`gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;
If there is any any alternative other than using Bq external table to get the file name being used to build the row thats also fine I can switch to that approach as well.

@MikeKarp-我在上面的帖子中有两个问题,一个是使用Bq加载命令加载Bq外部表,但失败了,通过这次尝试,我的问题是是否可以使用Bq负载加载Bq内部表?。第二个是我试图使用external_table=True的GoogleCloudStorageToBigQueryOperator加载通过terraform创建的外部表(提供外部表所需的源uri路径(,但失败了;代码":409;消息":"表已存在。从第二个开始,我不确定为什么GoogleCloudStorageToBigQueryOperator试图再次创建表,而外部表已经通过Terraform 在我的GCP项目中创建了

Ahhhh我相信我现在明白了,所以你有一个可以查询的现有外部表,你想把它加载到一个新的静态表中,对吗?最简单的方法是直接使用SQL。

您可以通过SQL中的create语句从一个单独的现有外部表创建一个新表:

CREATE TABLE `gcp_project_name.{dataset_name}.{new_standard_table_name}` as
SELECT *,
_FILE_NAME AS file_loc 
FROM
`gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;

更新上面的表名后,可以直接在bigquerySQL editior中运行此操作。

根据您上面的问题,正如您所指出的,我还将附加列留在了中,这很好——您可以在SELECT组件中添加所需的任意多个新派生等,这些派生将使其进入新表中。

相关内容

最新更新