我在气流中使用pubsubpullooperator从gcp订阅中提取消息。
pull_messages_task = PubSubPullOperator(
task_id="pull_messages",
ack_messages=True,
project_id=GCP_PROJECT_ID,
subscription="k8s-sub",
gcp_conn_id=GCP_CONN_ID,
max_messages=50
)
从订阅中提取消息并保留在Xcom中工作得很好。我的问题是为什么pubsubpullooperator不能每次拉消息计数等于max_messages ?
例如,我向GCP主题发布250条消息。我的Dag每分钟运行一次,每次提取50条消息。
下面是气流的过程日志:
[2022-05-17 14:53:04,630] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:53:06,661] {pubsub.py:550} INFO - Pulled 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:54:04,312] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:54:06,239] {pubsub.py:550} INFO - Pulled 16 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:55:04,055] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:55:05,259] {pubsub.py:550} INFO - Pulled 4 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:56:04,590] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:56:06,527] {pubsub.py:550} INFO - Pulled 20 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:57:04,083] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:57:07,428] {pubsub.py:550} INFO - Pulled 38 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:58:05,561] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:58:07,431] {pubsub.py:550} INFO - Pulled 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:59:04,348] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 14:59:05,462] {pubsub.py:550} INFO - Pulled 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 15:00:06,882] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 15:00:08,710] {pubsub.py:550} INFO - Pulled 2 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 15:01:03,519] {pubsub.py:536} INFO - Pulling max 50 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
[2022-05-17 15:01:03,688] {pubsub.py:550} INFO - Pulled 20 messages from subscription (path) projects/production-1/subscriptions/k8s-sub
我很确定每天的运行时间在1分钟以下。并且50条消息的大小不超过Xcom限制(48KB)。
有谁知道这个情况吗?或者有人知道接线员是怎么决定要拉多少条消息的吗?
许多谢谢。
这个功能似乎来源于google的pubsub客户端库,而不是气流操作符本身的特性/问题。
来自google文档
The maximum number of messages to return for this request. Must be a positive integer. The Pub/Sub system may return fewer than the number specified.
操作符的依赖pubsubpullooperator使用PubSubHook使用SubscriberClient