我们目前正在使用气流Python操作符将拼花文件从GCS存储加载到BigQuery。我希望能够将源代码中的所有数字列声明为Big numeric,这可能吗?
bq_load = GCSToBigQueryOperator(
task_id="gcs_to_bigquery_modified_airflow",
bucket="{{ dag_run.conf['bucket'] }}",
source_objects=["{{ dag_run.conf['name'] }}"],
source_format ='parquet',
destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}",
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_APPEND",
autodetect=True
)
您可以使用GCSToBigQueryOperator的schema_field
参数手动定义模式而不是使用autodetect
请参阅下面的更新代码:
bq_load = GCSToBigQueryOperator(
task_id="gcs_to_bigquery_modified_airflow",
bucket="{{ dag_run.conf['bucket'] }}",
source_objects=["{{ dag_run.conf['name'] }}"],
source_format ='parquet',
destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}",
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_APPEND",
schema_fields=[{"name": "sample_col_1", "type": "BIGNUMERIC", "mode": "NULLABLE"},{"name": "sample_col_2", "type": "BIGNUMERIC", "mode": "NULLABLE"}, {"name": "sample_col_3", "type": "BIGNUMERIC", "mode": "NULLABLE"}]
)
您可以参考GCSToBigQueryOperator文档了解更多详细信息。