BigQuery使用Python从具有自动检测模式的Parquet文件创建外部表



我找不到任何使用自动检测模式从Paquet文件创建外部表的示例。这是我当前的代码:

bq_client = bigquery.Client.from_service_account_json(key_path)
table_name = "my_table"
table_id = f"{PROJECT_ID}.{DATASET}.{table_name}"    
dataset_ref = bq_client.dataset(DATASET)
table_ref = bigquery.TableReference(dataset_ref, table_id)
table_schema = [bigquery.schema.SchemaField("example","STRING")] # I don't want this
table = bigquery.Table(table_ref, table_schema) # I don't want this

external_config = bigquery.ExternalConfig(source_format='PARQUET')
source_uris = [f"gs://path/to/file_name.snappy.parquet"]
external_config.source_uris = source_uris
external_config.autodetect = True
table.external_data_configuration = external_config # Not sure how to do this

bq_client.create_table(table) # and this without table schema
logger.debug("Created table '{}'.".format(table_id))

目前我必须指定表模式。我想自动检测模式。请帮忙。非常感谢。

查看文档https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#loading_parquet_data_into_a_new_table

from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.PARQUET,)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)  # Make an API request.
load_job.result()  # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

相关内容

  • 没有找到相关文章

最新更新