我正在尝试调整我的WS以支持~20k并发用户。
无论我更改什么配置,当我的测试遇到 6(两个(k 用户和各种 502/504 错误时,我仍然会得到相同的 6 秒平均响应时间/每个端点。
网络服务:
CloudFlare <--> Nginx <--> Gunicorn <-->Django/DRF <--> Memcache <---> Postgres
这是我尝试过的:
- 将枪角兽工人从 4 增加到 10
- 将服务(Pod(实例从 3 个增加到 10 个
- 将枪角兽工作线程超时增加到 120
- 将 Nginx proxy_pass超时增加到 120
大多数端点每 100 秒访问一次数据库,其他请求从 memcache 获取数据。
任何人都可以通过指出我应该更改哪种配置来提供帮助吗?
我应该在哪里寻找延迟/瓶颈?
独角兽工人显然正在计时,我不理解这一点,因为 WS 视图中没有逻辑。它应该只从 memcache 获取查询并返回它。
Nginx日志:
latforms/android HTTP/1.1", upstream: "http://10.0.1.17:9090/endpoints/platforms/android", host: "myhost.co"
2018/08/13 23:43:25 [error] 8893#8893: *2809163 upstream timed out (110: Connection timed out) while connecting to upstream, client: 200.211.198.133, server: myhost.co, request: "GET /endpoints/store/products/729 HTTP/1.1", upstream: "http://10.0.1.18:9090/endpoints/store/products/729", host: "myhost.co"
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1" 200 1718 "-" "python-requests/2.18.4" 627 80.840 [production-service-api-80] 10.0.0.112:9090, 10.0.1.13:9090, 10.0.0.113:9090 0, 0, 11150 40.000, 40.000, 0.840 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1" 200 1718 "-" "python-requests/2.18.4" 689 80.857 [production-service-api-80] 10.0.0.112:9090, 10.0.1.12:9090, 10.0.0.113:9090 0, 0, 11150 40.000, 40.000, 0.857 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/home/ HTTP/1.1" 200 10072 "-" "python-requests/2.18.4" 670 80.580 [production-service-api-80] 10.0.1.13:9090, 10.0.1.11:9090, 10.0.0.112:9090 0, 0, 66511 40.001, 40.002, 0.577 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/691/ HTTP/1.1" 200 703 "-" "python-requests/2.18.4" 646 80.486 [production-service-api-80] 10.0.1.8:9090, 10.0.1.13:9090, 10.0.1.12:9090 0, 0, 1968 40.000, 40.000, 0.486 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/5458 HTTP/1.1" 301 0 "-" "python-requests/2.18.4" 678 80.444 [production-service-api-80] 10.0.1.13:9090, 10.0.1.12:9090, 10.0.1.17:9090 0, 0, 0 40.000, 40.002, 0.442 504, 504, 301
....
90, 10.0.1.11:9090, 10.0.1.8:9090 0, 0, 1968 40.000, 40.000, 0.584 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/5458/ HTTP/1.1" 200 241 "-" "python-requests/2.18.4" 647 80.709 [production-service-api-80] 10.0.0.113:9090, 10.0.1.8:9090, 10.0.0.112:9090 0, 0, 327 40.001, 40.000, 0.708 504, 504, 200
--
2018/08/13 23:43:25 [error] 8766#8766: *2809243 upstream timed out (110: Connection timed out) while connecting to upstream, client: 200.211.198.133, server: myhost.co, request: "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1", upstream: "http://10.0.1.13:9090/endpoints/store/categories/?cat_pk=13081", host: "myhost.co"
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/692 HTTP/1.1" 301 0 "-" "python-requests/2.18.4" 677 80.672 [production-service-api-80] 10.0.1.17:9090, 10.0.1.10:9090, 10.0.0.113:9090 0, 0, 0 40.001, 40.001, 0.670 504, 504, 301
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/4608/ HTTP/1.1" 200 553 "-" "python-requests/2.18.4" 647 80.591 [production-service-api-80] 10.0.1.11:9090, 10.0.1.17:9090, 10.0.1.8:9090 0, 0, 1090 40.000, 40.003, 0.588 504, 504, 200
独角兽日志:
{"asctime": "2018-08-13 23:42:55,145", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/store/products/691/ HTTP/1.1" 200 1968 "-" "python-requests/2.18.4""}
{"asctime": "2018-08-13 23:42:55,167", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/store/products/729 HTTP/1.1" 301 - "-" "python-requests/2.18.4""}
[2018-08-13 23:42:55 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:36)
[2018-08-13 23:42:55 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:37)
[2018-08-13 23:42:55 +0000] [382] [INFO] Booting worker with pid: 382
[2018-08-13 23:42:55 +0000] [383] [INFO] Booting worker with pid: 383
{"asctime": "2018-08-13 23:42:55,403", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/store/products/691/ HTTP/1.1" 200 1968 "-" "python-requests/2.18.4""}
....
{"asctime": "2018-08-13 23:42:55,184", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1" 200 11150 "-" "python-requests/2.18.4""}
{"asctime": "2018-08-13 23:42:55,262", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/platforms/android HTTP/1.1" 200 48 "-" "python-requests/2.18.4""}
{"asctime": "2018-08-13 23:42:55,439", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] "GET /endpoints/platforms/android HTTP/1.1" 200 48 "-" "python-requests/2.18.4""}
--
[2018-08-13 23:42:56 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:31)
{"asctime": "2018-08-13 23:42:56,689", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:56 +0000] "GET /endpoints/store/products/729/ HTTP/1.1" 200 2163 "-" "python-requests/2.18.4""}
{"asctime": "2018-08-13 23:42:56,799", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:56 +0000] "GET /endpoints/store/products/5458/ HTTP/1.1" 200 327 "-" "python-requests/2.18.4""}
为什么不使用UWSGI?
为了更好地工作,请这样做
- 减少代码中的数据库命中率
- 增加 Gunicorn 的辅助角色数量
- Gunicorn和NGINX的可DIABLE信息记录
如果这些配置对您不起作用,则必须更改设置配置或增加服务器的资源。