我的应用程序(clads(在Django上运行,并使用Celery进行定时和异步任务。不幸的是,我似乎无法找出一些阻止 Celery 进程写入 Django 应用程序日志或操作 Django 应用程序创建的文件的权限问题。 Django 应用程序在 wsgi 进程中运行,我有一些配置文件来设置应用程序日志目录,以便 wsgi 进程可以写入它(见下文(。
但是,似乎 celery 进程以没有权限写入这些文件的其他用户身份运行(当它看到日志文件配置时,它会自动尝试这样做 - 也在下面。 注意我试图将其更改为以 wsgi 运行,但没有工作(。 同样的权限问题似乎阻止了 Celery 进程操作 Django 应用程序创建的临时文件——这是该项目的要求。
诚然,我在 Unix 类型的操作系统上非常生疏,所以我确定我错过了一些简单的东西。 几天来,我一直在断断续续地搜索这个网站和其他网站,虽然我发现许多帖子让我接近这个问题,但我似乎仍然无法解决它。 我怀疑我的配置中可能需要一些额外的命令来设置权限或在其他用户下运行 Celery。任何帮助将不胜感激。 下面摘录了项目配置和相关代码文件。 大多数配置文件都是从这个网站和其他网站上找到的信息拼凑而成的——很抱歉没有选址,但没有保持足够接近的记录来确切地知道它们来自哪里。
原木和芹菜部分settings.py
#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(asctime)s - %(levelname)s - %(module)s.%(fileName)s.%(funcName)s %(processName)d %(threadName)d: %(message)s',
},
'simple': {
'format': '%(asctime)s - %(levelname)s: %(message)s'
},
},
'handlers' : {
'django_log_file': {
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'class': 'logging.FileHandler',
'filename': os.environ.get('DJANGO_LOG_FILE'),
'formatter': 'verbose',
},
'app_log_file': {
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'class': 'logging.FileHandler',
'filename': os.environ.get('CLADS_LOG_FILE'),
'formatter': 'verbose',
},
},
'loggers': {
'django': {
'handlers': ['django_log_file'],
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'propagate': True,
},
'clads': {
'handlers': ['app_log_file'],
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'propagate': True,
},
},
}
WSGI_APPLICATION = 'clads.wsgi.application'
# celery settings
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_SEND_EVENTS = False
CELERY_BROKER_URL = os.environ.get('BROKER_URL')
tasks.py摘录 LOGGER = logging.getLogger('clads.pit'(
@shared_task(name="archive_pit_file")
def archive_pit_file(tfile_name):
LOGGER.debug('archive_date_file called for ' + tfile_name)
LOGGER.debug('connecting to S3 ...')
s3 = boto3.client('s3')
file_fname = os.path.join(settings.TEMP_FOLDER, tfile_name)
LOGGER.debug('reading temp file from ' + file_fname)
s3.upload_file(file_fname, settings.S3_ARCHIVE, tfile_name)
LOGGER.debug('cleaning up temp files ...')
#THIS LINE CAUSES PROBLEMS BECAUSE THE CELERY PROCESS DOES'T HAVE
#PERMISSION TO REMOVE TEH WSGI OWNED FILE
os.remove(file_fname)
logging.config
commands:
01_change_permissions:
command: chmod g+s /opt/python/log
02_change_owner:
command: chown root:wsgi /opt/python/log
99_celery.config
container_commands:
04_celery_tasks:
command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
05_celery_tasks_run:
command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
leader_only: true
celery_configuration.txt
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr 'n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A clads -b <broker_url> --loglevel=INFO --without-gossip --without-mingle --without-heartbeat
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A clads -b <broker_url> --loglevel=INFO --workdir=/tmp
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
我无法准确找出权限问题,但找到了可能有助于其他人的解决方法。我删除了日志设置中的文件处理程序配置,并将其替换为流处理程序。这解决了权限问题,因为 Celery 进程不必尝试访问 wsgi 用户拥有的日志文件。
来自 Web 应用程序的日志消息最终出现在 httpd 错误日志中 - 不理想,但至少我可以找到它们,它们也可以通过弹性 beanstalk 控制台访问 - 并且 Celery 日志被写入/var/log 中的芹菜工人.log和芹菜节拍.log。我无法通过控制台访问它们,但可以通过直接登录到实例来访问它们。这也不理想,因为这些日志不会被轮换,如果实例被停用,这些日志就会丢失,但至少它让我暂时开始了。
以下是使其以这种方式工作的修改日志设置:
#log settings
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '%(asctime)s - %(levelname)s - %(module)s.%(filename)s.%(funcName)s %(processName)s %(threadName)s: %(message)s',
},
'simple': {
'format': '%(asctime)s - %(levelname)s: %(message)s'
},
},
'handlers' : {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'verbose',
}
},
'loggers': {
'django': {
'handlers': ['console'],
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'propagate': True,
},
'clads': {
'handlers': ['console'],
'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
'propagate': True,
},
},
}