Spark Worker崩溃,出现致命的Python错误:已到达无法访问的C代码路径.Python运行时状态:已初始化



我有一个Spark程序,我在计算机的4个核心中运行它。我在核心之间划分数据。当程序运行时,我收到这个错误:

Fatal Python error: Unreachable C code path reached
Python runtime state: initialized
Current thread 0x00007ff2c408e740 (most recent call first):
File "/usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py", line 39 in _amax
File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 975 in reduction
File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 392 in new_func
File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 133 in f
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 4152 in _reduce
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 10706 in _stat_function
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 10716 in max
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 11189 in max
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 85 in _wrapreduction
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2705 in amax
File "<__array_function__ internals>", line 5 in amax
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1044 in <lambda>
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1160 in <lambda>
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 732 in _aggregate_series_fast
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 706 in agg_series
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1173 in _python_agg_general
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/generic.py", line 259 in aggregate
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1044 in _agg_general
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1676 in max
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/generic.py", line 241 in aggregate
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/generic.py", line 315 in _aggregate_multiple_funcs
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/generic.py", line 247 in aggregate
File "/usr/local/lib/python3.8/dist-packages/pandas/core/aggregation.py", line 752 in <dictcomp>
File "/usr/local/lib/python3.8/dist-packages/pandas/core/aggregation.py", line 752 in agg_dict_like
File "/usr/local/lib/python3.8/dist-packages/pandas/core/aggregation.py", line 566 in aggregate
File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/generic.py", line 945 in aggregate
File "/home/spark/PycharmProjects/project1/file_b.py", line 32 in create_update_df
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/util.py", line 74 in wrapper
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240 in mergeValues
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/rdd.py", line 2146 in combineLocally
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/rdd.py", line 417 in func
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/rdd.py", line 2918 in pipeline_func
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/rdd.py", line 2918 in pipeline_func
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 609 in process
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 619 in main
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/daemon.py", line 74 in worker
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/daemon.py", line 186 in manager
File "/home/spark/PycharmProjects/project1/venv/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/daemon.py", line 211 in <module>
File "/usr/lib/python3.8/runpy.py", line 87 in _run_code
File "/usr/lib/python3.8/runpy.py", line 194 in _run_module_as_main
22/01/17 18:53:07 ERROR Executor: Exception in task 16.0 in stage 3.0 (TID 126)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)

我收到错误的代码部分如下:

simple_f_df = input_df.groupby(['num']).agg(date=('date', 'max'),
count_t=('date','count'),
count_c=('is_c','sum'),
sum_c=('c_am','sum'),
sum_d=('d_am','sum'),
sum_c_f= 'c_am_f','sum'),
count_f=('lbl','sum'),
COUNT_CH=('lbl_ch','sum'),
COUNT_T=('lbl_t','sum')) 
.reset_index()

在每个核心中,input_df大约有500000条记录。

你能告诉我我收到错误的上述代码出了什么问题吗?如何解决?

非常感谢您的帮助。

问题已解决。这个错误是因为Pycharm沼泽。事实上,我使用了Pycharm 2020.3,当我安装Pycharm 2021.3.1时,一切都很好。

相关内容

  • 没有找到相关文章

最新更新