导入 pyspark 错误 Pyspark 与 python 3.5.1



Pyspark with python 2.7对我来说很好用。我安装了python 3.5.1(从源代码安装)当我在终端中运行pyspark时出现此错误

Python 3.5.1 (default, Apr 25 2016, 12:41:28) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/context.py", line 28, in <module>
    from pyspark import accumulators
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/accumulators.py", line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/serializers.py", line 58, in <module>
    import zlib
ImportError: No module named 'zlib'

我试过python 3.4.3,这也很好用

您是否检查以确保 zlib 确实存在于您的 python 安装中?它应该是默认的,但奇怪的事情发生了。

您是否在

.bashrc 文件中提供了系统 python3.5.1 到 "PYSPARK_PYTHON" 的确切路径?

 Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _ / _ / _ `/ __/  '_/
/__ / .__/_,_/_/ /_/_   version 2.1.1
   /_/  
Using Python version 3.6.1 (default, Jun 23 2017 16:20:09)
SparkSession available as 'spark'.

这是我的 PySpark 提示显示的。Apache Spark 版本是 2.1.1

PS:我使用Anaconda3(Python 3.6.1)作为我的日常PySpark代码,我的PYSPARK_DRIVER设置为"jupyter"

上面的例子是我的默认系统 Python 3.6

试试conda install -c conda-forge pyspark 如果你的问题仍然存在,可能需要更改你的~/.basrc

在 shell 中启动 pyspark 之前,以下命令类型如下: 导出PYSPARK_PYTHON=python3.5(或) 导出PYSPARK_PYTHON=python3.5

对我有用!

安装 Python 3.5 后:

1. Install pip
    sudo apt-get install python<VERSION>-pip
2. Install notebook if this module is not installed
3. Install ipython
    sudo apt-get install ipython<VERSION> ipython<VERSION>-notebook
4. Install py4j it's not installed
5. Set environment variables
   export PYSPARK_PYTHON=python<VERSION>
   export PYSPARK_DRIVER_PYTHON=ipython<VERSION>
   export PYSPARK_DRIVER_PYTHON_OPTS=notebook
   export SPARK_HOME=/usr/hdp/current/spark2-client
   export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip
6. If necessary(and if it's installed)
   export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-<VERSION>-src.zip
7. Don't forget to save all this 'export' on ~/.bashrc
8. source ~/.bashrc
9. pyspark

现在,如果一切正常,将运行浏览器,并出现一个带有"python3"会话的笔记本。此会话已经有一个"SparkSession"对象和"SparkContext"对象分别在"spark"和"sc"变量上可用

也尝试使用 pip3 版本安装 pyspark,然后在代码中设置路径 os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'

最新更新