如何使用Django文件存储在API和Worker之间共享文件



目标

我想构建一个本地docker-compose部署,这样我就有了5个服务。

  • Redis
  • Postgres
  • RabbitMQ
  • Django API
  • Django Worker

在此部署中,用户通过API端点上载文件。此端点将文件存储在模型中的FileField字段中。

在单独的事务中,用户将通过单独的端点触发异步任务。此任务将负责

  • 下载文件
  • 提取文件
  • 踢子任务以执行intermediate processing steps
  • 将处理结果上传到数据库

intermediate processing steps不应该将任何文件上传到数据库。

intermediate processing steps将使用django的内部文件存储解决方案在那里下载和上传文件。这是用与此问题无关的文件系统层次结构实现的。

问题

我已经设法让我的本地文件系统使用此配置。如果我运行redispostgresrabbitmq的后端。然后,我在本地机器上运行API和Worker,一切都很好。

当我创建一个docker-compose配置并解耦所有内容时。手术似乎中断了。在我的docker撰写日志中,我看到的是:

worker_1  | [2019-10-23 22:27:34,626: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1  | [2019-10-23 22:27:34,627: WARNING/ForkPoolWorker-2] // BEGINNING TASK
worker_1  | [2019-10-23 22:27:34,627: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1  | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Root Job - 183916ca-f6e6-4e7c-a997-e8f516ccf8be
worker_1  | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Parent Job - None
worker_1  | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] // Current Job - 183916ca-f6e6-4e7c-a997-e8f516ccf8be
worker_1  | [2019-10-23 22:27:34,628: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1  | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] // PERFORMING DATA SET PRE PROCESSING
worker_1  | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] //--------------------------------------------------------------------------------
worker_1  | [2019-10-23 22:27:34,629: WARNING/ForkPoolWorker-2] {'data_set_id': 1, 'starting_node': 'Live', 'organization_id': 1}
worker_1  | [2019-10-23 22:27:34,630: WARNING/ForkPoolWorker-2] Downloading the files required to run!
worker_1  | [2019-10-23 22:27:34,645: WARNING/ForkPoolWorker-2] Downloading remote file `organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz`
worker_1  | [2019-10-23 22:27:34,646: WARNING/ForkPoolWorker-2] Exists: `False`
worker_1  | [2019-10-23 22:27:34,646: WARNING/ForkPoolWorker-2] ERROR occured: [Errno 2] No such file or directory: '/opt/api_webserver/media/organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz'.
worker_1  | [2019-10-23 22:27:34,653: INFO/ForkPoolWorker-2] Task api.versions.v1.tasks.main_task.main_task[183916ca-f6e6-4e7c-a997-e8f516ccf8be] succeeded in 0.02647909999359399s: {'iteration': 0, 'completion': 0, 'status': 'ERROR', 'message': 'Excecuting `main_task` failed!', 'error': 'Error in `main_task`: [Errno 2] No such file or directory: '/opt/api_webserver/media/organizations/1/data_sets/flow_cytometry/triple_hello_world_payload.tgz'.'}

如果我进入workerdocker container并检查文件系统,则media目录和文件的路径不存在
如果我进入apidocker container并检查文件系统,则media目录和文件的路径确实存在。

相关规范

我将不提供视图代码或api代码,因为api运行良好。

上传和文件检索在workers过程中使用djangodefault_storage接口进行处理。

Django的默认存储接口API

这个问题与工作者有关,所以这里有一些相关的代码。

工人.py

# Python Standard Libraries
import os
# Third-Party Libraries
import tempfile
# Custom
from models.data_set_model import DataSet
from tasks.helpers import download_remote_file

def download_data_set(data_set_id):
print("Downloading the files required to run!")
data_set = DataSet.objects.get(id=data_set_id)
remote_file_path = data_set.file.name
remote_file_name = os.path.basename(remote_file_path)
temporary_directory_path = tempfile.mkdtemp()
temporary_compressed_file_path = os.path.join(temporary_directory_path, remote_file_name)
download_remote_file(remote_file_path, temporary_compressed_file_path)
return temporary_compressed_file_path

helpers.py

# Python Standard Libraries
# N/A
# Third-Party Libraries
from django.core.files.storage import default_storage
# CustomLibraries
# N/A

def download_remote_file(remote_file_path, local_file_path):
print(f"Downloading remote file `{remote_file_path}`")
print(f"Exists: `{default_storage.exists(remote_file_path)}`")
remote_file_contents = None
with default_storage.open(remote_file_path) as remote_file_handle:
print("Reading file contents")
remote_file_contents = remote_file_handle.read()
print(f"Placing remote file contents into `{local_file_path}`")
with open(local_file_path, "wb") as local_file_handle:
local_file_handle.write(remote_file_contents)

悬而未决的问题

  • 我做错了什么
  • 从API的FileStorage系统在工作人员上下载文件的惯用方法是什么
  • default_storage.open()命令难道不应该指向API的文件系统并能够下载它吗
    • 如果没有,我可以在worker上进行配置来支持这一点吗
  • 这是否只在本地工作,因为文件系统是共享的,而发生这种情况的原因是因为docker-compose将这些系统分解为单独的环境

如果您在docker compose中,只需在您的api和worker之间创建一个共享的docker卷,并将其装载在两个容器中的一个已知点,如/mnt/share。确保将文件保存在API中,然后工作人员可以使用相同的模型访问它们,因为设置(例如MEDIA_ROOT(将API和工作人员都指向/mnt/share

相关内容

  • 没有找到相关文章

最新更新