GRPC服务器在JAVA中支持巨大负载的请求



我有一个java上的grpc (1.13.x)服务器,它不执行任何计算或I/O密集型任务。目的是检查该服务器在80核机器上每秒可以支持的请求数。

服务器:

ExecutorService executor = new ThreadPoolExecutor(160, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(),
new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat("Glowroot-IT-Harness-GRPC-Executor-%d")
.build());
Server server =  NettyServerBuilder.forPort(50051)
.addService(new MyService())
.executor(executor)
.build()
.start();

服务:

@Override
public void verify(Request request, StreamObserver<Result> responseObserver) {
Result result = Result.newBuilder()
.setMessage("hello")
.build();
responseObserver.onNext(result);
responseObserver.onCompleted();
}

我使用ghz客户机执行负载测试。服务器能够每秒处理40k请求但是RPS计数不能超过40k,即使并发客户端数量增加,传入请求率为100k。GRPC服务器每秒只能处理40K请求,并将所有其他请求排队。CPU未充分利用(7%)。大约90%的grpc线程(前缀为grpc-default-executor)处于等待状态,尽管没有I/O操作。超过25k个线程处于等待状态

等待线程的堆栈跟踪:

grpc-default-executor-4605
PRIORITY :5
THREAD ID :0X00007F15A4440D80
NATIVE ID :
stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@15.0.1/Native Method)
- parking to wait for <0x00007f1df161ae20> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@15.0.1/LockSupport.java:252)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@15.0.1/SynchronousQueue.java:462)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@15.0.1/SynchronousQueue.java:361)
at java.util.concurrent.SynchronousQueue.poll(java.base@15.0.1/SynchronousQueue.java:937)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@15.0.1/ThreadPoolExecutor.java:1055)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@15.0.1/ThreadPoolExecutor.java:1116)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@15.0.1/ThreadPoolExecutor.java:630)
at java.lang.Thread.run(java.base@15.0.1/Thread.java:832)
Locked ownable synchronizers:
- None

如何配置服务器以支持100K+请求?

gRPC堆栈中似乎没有任何内容导致此限制。服务器端的平均响应时间是多少?看起来您受到临时端口或TCP连接限制的限制,您可能希望按照此处https://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1或此处https://blog.box.com/ephemeral-port-exhaustion-and-web-services-at-scale

所述调整内核。