worker中的Python有不同的版本:环境变量设置正确



我正在Linux Mint上的Jupyter笔记本上运行Python脚本。

代码并不重要,但它在这里(这是一个图形框架教程(:

import pandas
import pyspark
from functools import reduce
from graphframes import *
from IPython.display import display, HTML
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import col, lit, when
from pyspark.sql.session import SparkSession
sc = SparkContext.getOrCreate()
sqlContext = SQLContext.getOrCreate(sc)
spark = SparkSession(sc)
vertices = sqlContext.createDataFrame(
[
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60),
],
["id", "name", "age"],
)
edges = sqlContext.createDataFrame(
[
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend"),
],
["src", "dst", "relationship"],
)
g = GraphFrame(vertices, edges)
display(g.inDegrees.toPandas())

最后一行是引起故障的行,它给出以下错误:

Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

这两个变量设置正确

printenv PYSPARK_PYTHON
-> /usr/bin/python3
printenv PYSPARK_DRIVER_PYTHON
-> /usr/bin/python3

我还将它们添加到我的spark-env.sh文件中,如下所示:

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
export PYSPARK_PYTHON=/usr/bin/python3       
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3   

但是错误仍然存在,我还能在哪里更新这些变量?

编辑

python --version
Python 3.7.4
pip3 list | grep jupyter
jupyter               1.0.0      
jupyter-client        5.3.4      
jupyter-console       6.0.0      
jupyter-core          4.6.1      
jupyterlab            1.1.4      
jupyterlab-server     1.0.6     
pip3 list | grep pyspark
pyspark               2.4.4

问题更可能是python版本冲突。将PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON设置为/usr/bin/python。或者,您可以使用venv

cd ~
python3 -m venv spark_test
cd spark_test
source ./bin/activate
pip3 install jupyterlab pyspark graphframes
jupyter notebook

您必须将jupyter文件放在新创建的文件夹中。

问题更可能是Python版本冲突。将PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON设置为/usr/bin/python。或者,您可以使用不同的环境:

conda create --name foo python=3.6
source activate foo
python -m pip install --user jupyter
conda activate foo
jupyter notebook

最新更新