我找不到任何使用自动检测模式从Paquet文件创建外部表的示例。这是我当前的代码:
bq_client = bigquery.Client.from_service_account_json(key_path)
table_name = "my_table"
table_id = f"{PROJECT_ID}.{DATASET}.{table_name}"
dataset_ref = bq_client.dataset(DATASET)
table_ref = bigquery.TableReference(dataset_ref, table_id)
table_schema = [bigquery.schema.SchemaField("example","STRING")] # I don't want this
table = bigquery.Table(table_ref, table_schema) # I don't want this
external_config = bigquery.ExternalConfig(source_format='PARQUET')
source_uris = [f"gs://path/to/file_name.snappy.parquet"]
external_config.source_uris = source_uris
external_config.autodetect = True
table.external_data_configuration = external_config # Not sure how to do this
bq_client.create_table(table) # and this without table schema
logger.debug("Created table '{}'.".format(table_id))
目前我必须指定表模式。我想自动检测模式。请帮忙。非常感谢。
查看文档https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#loading_parquet_data_into_a_new_table
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.PARQUET,)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))