目标
我想构建一个本地docker-compose部署,这样我就有了5个服务。
- Redis
- Postgres
- RabbitMQ
- Django API
- Django Worker
在此部署中,用户通过API端点上载文件。此端点将文件存储在模型中的FileField
字段中。
在单独的事务中,用户将通过单独的端点触发异步任务。此任务将负责
- 下载文件
- 提取文件
- 踢子任务以执行
intermediate processing steps
- 将处理结果上传到数据库
intermediate processing steps
不应该将任何文件上传到数据库。
intermediate processing steps
将使用django的内部文件存储解决方案在那里下载和上传文件。这是用与此问题无关的文件系统层次结构实现的。
问题
我已经设法让我的本地文件系统使用此配置。如果我运行redis
、postgres
和rabbitmq
的后端。然后,我在本地机器上运行API和Worker,一切都很好。
当我创建一个docker-compose
配置并解耦所有内容时。手术似乎中断了。在我的docker撰写日志中,我看到的是:
worker_1 | [2019-10-23 22:27:34,626: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1 | [2019-10-23 22:27:34,627: WARNING/ForkPoolWorker-2] // BEGINNING TASK
worker_1 | [2019-10-23 22:27:34,627: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1 | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Root Job - 183916ca-f6e6-4e7c-a997-e8f516ccf8be
worker_1 | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Parent Job - None
worker_1 | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Current Job - 183916ca-f6e6-4e7c-a997-e8f516ccf8be
worker_1 | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1 | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] // PERFORMING DATA SET PRE PROCESSING
worker_1 | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1 | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] {'data_set_id': 1, 'starting_node': 'Live', 'organization_id': 1}
worker_1 | [2019-10-23 22:27:34,630: WARNING/ForkPoolWorker-2] Downloading the files required to run!
worker_1 | [2019-10-23 22:27:34,645: WARNING/ForkPoolWorker-2] Downloading remote file `organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz`
worker_1 | [2019-10-23 22:27:34,646: WARNING/ForkPoolWorker-2] Exists: `False`
worker_1 | [2019-10-23 22:27:34,646: WARNING/ForkPoolWorker-2] ERROR occured: [Errno 2] No such file or directory: '/opt/api_webserver/media/organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz'.
worker_1 | [2019-10-23 22:27:34,653: INFO/ForkPoolWorker-2] Task api.versions.v1.tasks.main_task.main_task[183916ca-f6e6-4e7c-a997-e8f516ccf8be] succeeded in 0.02647909999359399s: {'iteration': 0, 'completion': 0, 'status': 'ERROR', 'message': 'Excecuting `main_task` failed!', 'error': 'Error in `main_task`: [Errno 2] No such file or directory: '/opt/api_webserver/media/organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz'.'}
如果我进入worker
docker container
并检查文件系统,则media
目录和文件的路径不存在
如果我进入api
docker container
并检查文件系统,则media
目录和文件的路径确实存在。
相关规范
我将不提供视图代码或api代码,因为api运行良好。
上传和文件检索在workers
过程中使用django
的default_storage
接口进行处理。
Django的默认存储接口API
这个问题与工作者有关,所以这里有一些相关的代码。
工人.py
# Python Standard Libraries
import os
# Third-Party Libraries
import tempfile
# Custom
from models.data_set_model import DataSet
from tasks.helpers import download_remote_file
def download_data_set(data_set_id):
print("Downloading the files required to run!")
data_set = DataSet.objects.get(id=data_set_id)
remote_file_path = data_set.file.name
remote_file_name = os.path.basename(remote_file_path)
temporary_directory_path = tempfile.mkdtemp()
temporary_compressed_file_path = os.path.join(temporary_directory_path, remote_file_name)
download_remote_file(remote_file_path, temporary_compressed_file_path)
return temporary_compressed_file_path
helpers.py
# Python Standard Libraries
# N/A
# Third-Party Libraries
from django.core.files.storage import default_storage
# CustomLibraries
# N/A
def download_remote_file(remote_file_path, local_file_path):
print(f"Downloading remote file `{remote_file_path}`")
print(f"Exists: `{default_storage.exists(remote_file_path)}`")
remote_file_contents = None
with default_storage.open(remote_file_path) as remote_file_handle:
print("Reading file contents")
remote_file_contents = remote_file_handle.read()
print(f"Placing remote file contents into `{local_file_path}`")
with open(local_file_path, "wb") as local_file_handle:
local_file_handle.write(remote_file_contents)
悬而未决的问题
- 我做错了什么
- 从API的FileStorage系统在工作人员上下载文件的惯用方法是什么
default_storage.open()
命令难道不应该指向API的文件系统并能够下载它吗- 如果没有,我可以在worker上进行配置来支持这一点吗
- 这是否只在本地工作,因为文件系统是共享的,而发生这种情况的原因是因为
docker-compose
将这些系统分解为单独的环境
如果您在docker compose中,只需在您的api和worker之间创建一个共享的docker卷,并将其装载在两个容器中的一个已知点,如/mnt/share
。确保将文件保存在API中,然后工作人员可以使用相同的模型访问它们,因为设置(例如MEDIA_ROOT
(将API和工作人员都指向/mnt/share
。