在Google Dataproc上启动Jupyter Notebook时,导入模块失败。我尝试使用不同的命令安装模块。一些例子:
import os
os.sytem("sudo apt-get install python-numpy")
os.system("sudo pip install numpy") #after having installed pip
os.system("sudo pip install python-numpy") #after having installed pip
import numpy
以上示例都不起作用,并返回导入错误:
在此处输入图像描述
当使用命令行时,我可以安装模块,但导入错误仍然存在。我想我在错误的位置安装了模块。
有什么想法吗?
我找到了一个解决方案。
import sys
sys.path.append('/usr/lib/python2.7/dist-packages')
os.system("sudo apt-get install python-pandas -y")
os.system("sudo apt-get install python-numpy -y")
os.system("sudo apt-get install python-scipy -y")
os.system("sudo apt-get install python-sklearn -y")
import pandas
import numpy
import scipy
import sklearn
如果有人有更优雅的解决方案,请告诉我。
尝试conda install numpy
,因为Google的jupyter init脚本正在使用conda。我个人更喜欢有自己的init脚本,这样我可以有更多的控制权。
#!/usr/bin/env bash
set -e
ROLE=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role)
INIT_ACTIONS_REPO=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_REPO || true)
INIT_ACTIONS_REPO="${INIT_ACTIONS_REPO:-https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git}"
INIT_ACTIONS_BRANCH=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_BRANCH || true)
INIT_ACTIONS_BRANCH="${INIT_ACTIONS_BRANCH:-master}"
DATAPROC_BUCKET=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-bucket)
echo "Cloning fresh dataproc-initialization-actions from repo $INIT_ACTIONS_REPO and branch $INIT_ACTIONS_BRANCH..."
git clone -b "$INIT_ACTIONS_BRANCH" --single-branch $INIT_ACTIONS_REPO
# Ensure we have conda installed.
./dataproc-initialization-actions/conda/bootstrap-conda.sh
#./dataproc-initialization-actions/conda/install-conda-env.sh
source /etc/profile.d/conda_config.sh
if [[ "${ROLE}" == 'Master' ]]; then
conda install jupyter
if gsutil -q stat "gs://$DATAPROC_BUCKET/notebooks/**"; then
echo "Pulling notebooks directory to cluster master node..."
gsutil -m cp -r gs://$DATAPROC_BUCKET/notebooks /root/
fi
./dataproc-initialization-actions/jupyter/internal/setup-jupyter-kernel.sh
./dataproc-initialization-actions/jupyter/internal/launch-jupyter-kernel.sh
fi
if gsutil -q stat "gs://$DATAPROC_BUCKET/scripts/**"; then
echo "Pulling scripts directory to cluster master and worker nodes..."
gsutil -m cp -r gs://$DATAPROC_BUCKET/scripts/* /usr/local/bin/miniconda/lib/python2.7
fi
if gsutil -q stat "gs://$DATAPROC_BUCKET/modules/**"; then
echo "Pulling modules directory to cluster master and worker nodes..."
gsutil -m cp -r gs://$DATAPROC_BUCKET/modules/* /usr/local/bin/miniconda/lib/python2.7
fi
echo "Completed installing Jupyter!"
# Install Jupyter extensions (if desired)
# TODO: document this in readme
if [[ ! -v $INSTALL_JUPYTER_EXT ]]
then
INSTALL_JUPYTER_EXT=false
fi
if [[ "$INSTALL_JUPYTER_EXT" = true ]]
then
echo "Installing Jupyter Notebook extensions..."
./dataproc-initialization-actions/jupyter/internal/bootstrap-jupyter-ext.sh
echo "Jupyter Notebook extensions installed!"
fi
您是否尝试执行以下命令?
pip install ipython[numpy]