数据流输入文件模式glob的结果数量限制



更新:

我们已经看到这400类错误re:

com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Request payload exceeds the allowable limit: 50000.", "reason" : "badRequest" } ], "message" : "Request payload exceeds the allowable limit: 50000.", "status" : "INVALID_ARGUMENT" } at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) at

在解析为:的glob上

TOTAL: 60 objects, 8405391 bytes (8.02 MiB)

在过去的几天里,输入球的可变性不断增加,达到了极限。

--

最近,我们观察到,当派生到大量文件的文件模式规范作为数据流作业的输入传递时,作业失败。在这些场景中产生的消息示例有:

Apr 29, 2015, 9:22:51 AM
(5dd3e79031bdcc45): com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Request payload exceeds the allowable limit: 50000.", "reason" : "badRequest" } ], "message" : "Request payload exceeds the allowable limit: 50000.", "status" : "INVALID_ARGUMENT" } at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1049) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$DataflowWorkUnitClient.reportWorkItemStatus(DataflowWorkerHarness.java:273) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.reportStatus(DataflowWorker.java:209) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:157) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:95) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
9:22:51 AM
Failed task is going to be retried.

对此,我们在作业并行化方面取得了一些成功,但我们想知道是否有硬性限制或配额。在达到最大重试次数后,重试的任务不可避免地会失败,从而导致作业失败。

谢谢!

Sal

数据流服务已经更新,可以处理此类较大的请求,并且不应该再产生此问题。

最新更新