我正在使用以下Dockerfile构建一个docker容器:
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y python python-dev python-pip
ADD . /app
RUN apt-get install -y python-scipy
RUN pip install -r /arrc/requirements.txt
EXPOSE 5000
WORKDIR /app
CMD python app.py
一切都很顺利,直到我运行图像并得到以下错误:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
我以前遇到过这个问题,这里会讨论它,但我不确定如何使用Docker来解决它。我试过:
CMD python
CMD import nltk
CMD nltk.download()
以及:
CMD python -m nltk.downloader -d /usr/share/nltk_data popular
但我还是犯了错误。
在Dockerfile中,尝试添加:
RUN python -m nltk.downloader punkt
这将运行命令并将请求的文件安装到//nltk_data/
这个问题很可能与在Dockerfile中使用CMD与RUN有关。CMD:文档
CMD的主要目的是为正在执行的容器提供默认值。
其在docker run <image>
期间而不是在构建期间使用。因此,其他CMD行可能被最后一个CMD python app.py
行覆盖。
我尝试了所有建议的方法,但都不起作用,所以我意识到nltk模块在/root/nltk_data 中搜索
第一步:我在机器上下载了punkt通过使用
python3
>>import nltk
>>nltk.download('punkt')
punkt在/root/nltk_data/tokenizer 中
步骤2:我将tokenizer文件夹复制到我的控制器我的目录看起来像这个
.
|-app/
|-tokenizers/
|--punkt/
|---all those pkl files
|--punkt.zip
以及步骤3:然后我修改了Dockerfile,它将其复制到我的docker实例中
COPY ./tokenizers /root/nltk_data/tokenizers
步骤4:新实例具有punkt
当我为django应用程序创建带有ubuntu映像和python3的docker映像时,我也遇到了同样的问题。
我决定如下所示。
# start from an official image
FROM ubuntu:16.04
RUN apt-get update
&& apt-get install -y python3-pip python3-dev
&& apt-get install -y libmysqlclient-dev python3-virtualenv
# arbitrary location choice: you can change the directory
RUN mkdir -p /opt/services/djangoapp/src
WORKDIR /opt/services/djangoapp/src
# copy our project code
COPY . /opt/services/djangoapp/src
# install dependency for running service
RUN pip3 install -r requirements.txt
RUN python3 -m nltk.downloader punkt
RUN python3 -m nltk.downloader wordnet
# Setup supervisord
RUN mkdir -p /var/log/supervisor
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Start processes
CMD ["/usr/bin/supervisord"]
我通过在容器中指示下载目的地来为谷歌云构建工作。
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
完整Dockerfile
FROM python:3.8.3
WORKDIR /app
ADD . /app
# install requirements
RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir --compile -r requirements.txt
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
CMD exec uvicorn --host 0.0.0.0 --port $PORT main:app
目前我不得不这样做:请参阅RUN cp -r /root/nltk_data /usr/local/share/nltk_data
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get clean && apt-get update && apt-get install -y locales
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
RUN apt-get -y update && apt-get install -y --no-install-recommends
sudo
python3
build-essential
python3-pip
python3-setuptools
python3-dev
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install --upgrade pip
ENV PYTHONPATH "${PYTHONPATH}:/app"
ADD requirements.txt .
# in requirements.txt: pandas, numpy, wordcloud, matplotlib, nltk, sklearn
RUN pip3 install -r requirements.txt
RUN [ "python3", "-c", "import nltk; nltk.download('stopwords')" ]
RUN [ "python3", "-c", "import nltk; nltk.download('punkt')" ]
RUN cp -r /root/nltk_data /usr/local/share/nltk_data
RUN addgroup --system app
&& adduser --system --ingroup app app
WORKDIR /home/app
ADD inputfile .
ADD script.py .
# the script uses the python modules: pandas, numpy, wordcloud, matplotlib, nltk, sklearn
RUN chown app:app -R /home/app
USER app
RUN python3 script.py inputfile outputfile