为什么我的容器化Selenium应用程序只在AWS Lambda中失败



我正在尝试让一个函数在AWS Lambda中运行,该函数使用Selenium和Firefox/geckodriver来运行。我决定创建一个容器映像,然后上传并运行它,而不是使用预先配置的运行时。我能够创建一个Dockerfile,正确安装Firefox和Python,下载geckodriver,并安装我的测试代码:

FROM alpine:latest
RUN apk add firefox python3 py3-pip
RUN pip install requests selenium
RUN mkdir /app
WORKDIR /app
RUN wget -qO gecko.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-linux64.tar.gz
RUN tar xf gecko.tar.gz
RUN mv geckodriver /usr/bin
COPY *.py ./
ENTRYPOINT ["/usr/bin/python3","/app/lambda_function.py"]

硒测试代码:

#!/usr/bin/env python3
import util
import os
import sys
import requests
def lambda_wrapper():
api_base = f'http://{os.environ["AWS_LAMBDA_RUNTIME_API"]}/2018-06-01'
response = requests.get(api_base + '/runtime/invocation/next')
request_id = response.headers['Lambda-Runtime-Aws-Request-Id']
try:
result = selenium_test()

# Send result back
requests.post(api_base + f'/runtime/invocation/{request_id}/response', json={'url': result})
except Exception as e:
# Error reporting
import traceback
requests.post(api_base + f'/runtime/invocation/{request_id}/error', json={'errorMessage': str(e), 'traceback': traceback.format_exc(), 'logs': open('/tmp/gecko.log', 'r').read()})
raise
def selenium_test():
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
options.add_argument('--window-size 1920,1080')

ffx = Firefox(options=options, log_path='/tmp/gecko.log')
ffx.get("https://google.com")
url = ffx.current_url
ffx.close()
print(url)
return url

def main():
# For testing purposes, currently not using the Lambda API even in AWS so that
# the same container can run on my local machine.
# Call lambda_wrapper() instead to get geckodriver logs as well (not informative).
selenium_test()

if __name__ == '__main__':
main()

我能够用docker build -t lambda-test .在本地机器上成功地构建这个容器,然后用docker run -m 512M lambda-test运行它。

然而,当我尝试将完全相同的容器上传到Lambda运行时,它会崩溃并出现错误。我将内存限制设置为1024M,并将超时设置为30秒。回溯显示Firefox被一个信号意外杀死:

START RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Version: $LATEST
/app/lambda_function.py:29: DeprecationWarning: use service_log_path instead of log_path
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
Traceback (most recent call last):
File "/app/lambda_function.py", line 45, in <module>
main()
File "/app/lambda_function.py", line 41, in main
lambda_wrapper()
File "/app/lambda_function.py", line 12, in lambda_wrapper
result = selenium_test()
File "/app/lambda_function.py", line 29, in selenium_test
ffx = Firefox(options=options, log_path='/tmp/gecko.log')
File "/usr/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
RemoteWebDriver.__init__(
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status signal
END RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30
REPORT RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30  Duration: 20507.74 ms   Billed Duration: 21350 ms   Memory Size: 1024 MB    Max Memory Used: 131 MB Init Duration: 842.11 ms    
Unknown application error occurred

我也让它上传了壁虎司机的日志,但里面没有太多有用的信息:

1608506540595   geckodriver INFO    Listening on 127.0.0.1:41597
1608506541569   mozrunner::runner   INFO    Running command: "/usr/bin/firefox" "--marionette" "-headless" "--window-size 1920,1080" "-foreground" "-no-remote" "-profile" "/tmp/rust_mozprofileQCapHy"
*** You are running in headless mode.

我怎么才能开始调试它呢?事实上,完全相同的容器根据其运行位置的不同而表现不同,这对我来说似乎很可疑,但我对Selenium、Docker或Lambda的了解不够,无法准确指出问题所在

我的docker run命令是否没有准确地在Lambda中重新创建环境?如果是,那么我应该运行什么命令来更好地模拟Lambda环境?我真的不确定接下来该怎么办,因为我实际上无法在本地复制错误进行测试。

如果有人想查看完整的代码并尝试自己构建它,那么存储库就在这里——lambda代码在lambda_function.py中。

至于之前的研究,这个问题a(是关于ChromeDriver的,b(一年多前没有答案。这个链接只有关于如何在Lambda中运行容器的信息,我已经在做了。这个答案几乎是我的问题,但我知道没有版本不匹配,因为容器在我的笔记本电脑上运行得很好

我有完全相同的问题和可能的解释。我认为你想要的暂时是不可能的。

根据AWS DevOps博客,Firefox依赖于fallocate系统调用和/dev/shm。然而,AWS Lambda没有装载/dev/shm,因此Firefox在尝试分配内存时会崩溃。不幸的是,不能为Firefox禁用此处理。

但是,如果您可以使用Chromium,chromedriver--disable-dev-shm-usage有一个选项,可以禁用/dev/shm的使用,而是将共享内存文件写入/tmp。chromedriver在AWS Lambda上对我来说很好,如果你可以选择的话。根据AWS DevOps博客,您也可以使用AWS Fargate来运行Firefox/geckodriver。

2015年,AWS论坛上有一个条目要求在Lambdas中安装/dev/shm,但此后什么也没发生。