BigQuery SQL作业依赖于Dataflow管道

我在python中有一个apache beam管道，无论出于什么原因，它都有如下的流。

client = bigquery.Client()
query_job1 = client.query('create table sample_table_1 as select * from table_1')  
result1 = query_job.result()
with beam.Pipeline(options=options) as p:
records = (
p
| 'Data pull' >> beam.io.Read(beam.io.BigQuerySource(...))
| 'Transform' >> ....
| 'Write to BQ' >> beam.io.WriteToBigQuery(...)
)
query_job2 = client.query('create table sample_table_2 as select * from table_2')  
result2 = query_job2.result()

SQL作业-->数据管道-->SQL作业

当我在本地运行此程序时，此序列运行良好。然而，当我试图将其作为数据流管道运行时，它并没有按照这个顺序运行。

在数据流上运行时，是否有强制依赖关系的方法？

正如@PeterKim所提到的，您在注释部分描述的处理流程不可能仅使用Dataflow来实现。目前，数据流编程模型不支持它。

在这里，您可以使用Composer来编排相互依赖的顺序作业执行。

相关内容

最新更新

热门标签：