Spark安装问题-TypeError:需要一个整数(获取类型字节)-Spark-2.4.5-bin-hadoop2.7



我正在64位Windows操作系统计算机上安装Spark。我安装了python 3.8.2。我有版本20.0.2的pip。我下载了spark-2.4.5-bin-hadoop2.7,并将环境变量设置为HADOOP_HOME、spark_HOME,并将pyspark添加到路径变量中。当我从cmd运行pyspark时,我看到下面给出的错误:

C:Usersaa>pyspark
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkshell.py", line 31, in <module>
from pyspark import SparkConf
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpyspark__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkcontext.py", line 31, in <module>
from pyspark import accumulators
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkaccumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkserializers.py", line 72, in <module>
from pyspark import cloudpickle
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkcloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "C:UsersaaDownloadsspark-2.4.5-bin-hadoop2.7spark-2.4.5-bin-hadoop2.7pythonpysparkcloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)

我想将pyspark导入到我的python代码中,但在Pycharm中,但运行代码文件后,我会出现类似TypeError的错误:还需要一个整数(获取类型字节(。我卸载了python 3.8.2,并尝试使用python 2.7,但在这种情况下,我犯了一个折旧错误。我接受下面给出的错误并更新pip安装程序。

Could not find a version that satisfies the requirement pyspark (from versions: )
No matching distribution found for pyspark 

然后我运行python -m pip install --upgrade pip来更新pip,但我再次遇到TypeError: an integer is required (got type bytes)问题。

C:Usersaa>python --version
Python 3.8.2
C:Usersaa>pip --version
pip 20.0.2 from c:usersaaappdatalocalprogramspythonpython38libsite-packagespip (python 3.8)
C:Usersaa>java --version
java 14 2020-03-17
Java(TM) SE Runtime Environment (build 14+36-1461)
Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)

我如何解决和克服这个问题?目前我有spark-2.4.5-bin-hadoop2.7和python 3.8.2。提前感谢!

这是一个python3.8和spark版本的兼容性问题,您可以看到:https://github.com/apache/spark/pull/26194.

要使其发挥作用(在一定程度上(,您需要:

  • 将pyspark目录中的cloudpickle.py文件替换为1.1.1版本,请在以下位置找到:https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py.
  • 编辑cloudpickle.py文件以添加:
def print_exec(stream):
ei = sys.exc_info()
traceback.print_exception(ei[0], ei[1], ei[2], None, stream)

然后您就可以导入pyspark了。

最新更新