Segfault用于使用conda安装新的ubuntu 20.04



python解释器在新安装的ubuntu 20.04.2的miniconda环境中运行时会出现segfault。这似乎是间歇性地发生的;pip";在环境的conda设置期间以及在执行如下代码期间。

segfault总是在运行以下代码时发生,该代码从文件中读取文本并标记结果。segfault位置随运行而变化。同样,完全相同的代码可以在ubuntu 18.04上使用相同conda环境的另一台计算机上运行。

核心转储总是指向python中unicodeobject.c文件中的某个函数,但确切的函数会随着崩溃而变化。至少一个崩溃具有清除的去引用指针0x0;unicode对象";应该是。

我的猜测是,某种原因导致python解释器在处理segfault时丢弃了指向unicode的对象。但是解释器或NLTK中的任何错误都应该被更多的用户注意到,我找不到有类似问题的人。

尝试过的事情没有解决问题:

  1. 重新格式化并重新安装ubuntu
  2. 切换到ubuntu 18.04(在这台计算机上,另一台18.04的计算机可以很好地运行代码(
  3. 更换硬件,以确保RAM或SSD磁盘不会损坏
  4. 更改为python版本3.8.6、3.8.8、3.9.2
  5. 将conda环境从工作计算机克隆到损坏的计算机

Attached是故障处理程序的一个堆栈跟踪,以及来自gdb的相应核心转储堆栈跟踪。

(eo) axel@minimind:~/test$ python tokenizer_mini.py 
2021-03-30 11:10:15.588399: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-30 11:10:15.588426: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Fatal Python error: Segmentation fault
Current thread 0x00007faa73bbe740 (most recent call first):
File "tokenizer_mini.py", line 36 in preprocess_string
File "tokenizer_mini.py", line 51 in <module>
Segmentation fault (core dumped)
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, 
end=0x4 <error: Cannot access memory at address 0x4>, begin=0x0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:1703
#3  _PyUnicode_Ready (unicode=0x7f7e4e04d7f0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:1742
#4  0x000055cd65f6df6a in PyUnicode_RichCompare (left=0x7f7e4cf43fb0, right=<optimized out>, op=2)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:11205
#5  0x000055cd6601712a in do_richcompare (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:726
#6  PyObject_RichCompare (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:774
#7  PyObject_RichCompareBool (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:796
#8  list_contains (a=0x7f7e4e04b4c0, el=0x7f7e4cf43fb0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/listobject.c:455
#9  0x000055cd660be41b in PySequence_Contains (ob=0x7f7e4cf43fb0, seq=0x7f7e4e04b4c0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/abstract.c:2083
#10 cmp_outcome (w=0x7f7e4e04b4c0, v=0x7f7e4cf43fb0, op=<optimized out>, tstate=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:5082
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:2977
#12 0x000055cd6609f706 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7e4f4d3c40)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:738
#13 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/call.c:284
#14 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/call.c:411
#15 0x000055cd660be54f in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f7f391985b8, callable=0x7f7f39084160)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Include/cpython/abstract.h:115
#16 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55cd66c2e880)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4963
#17 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:3500
#18 0x000055cd6609e503 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7f39198440)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4298
#19 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, 
argcount=<optimized out>, kwnames=<optimized out>, kwargs=<optimized out>, kwcount=<optimized out>, kwstep=<optimized out>, 
defs=<optimized out>, defcount=<optimized out>, kwdefs=<optimized out>, closure=<optimized out>, name=<optimized out>, 
qualname=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4298
#20 0x000055cd6609f559 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, 
args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4327
#21 0x000055cd661429ab in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:718
#22 0x000055cd66142a43 in run_eval_code_obj (co=0x7f7f3910f240, globals=0x7f7f391fad80, locals=0x7f7f391fad80)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1165
#23 0x000055cd6615c6b3 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f7f391fad80, locals=0x7f7f391fad80, 
flags=<optimized out>, arena=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1187
--Type <RET> for more, q to quit, c to continue without paging--
#24 0x000055cd661615b2 in pyrun_file (fp=0x55cd66c2cdf0, filename=0x7f7f391bbee0, start=<optimized out>, globals=0x7f7f391fad80, 
locals=0x7f7f391fad80, closeit=1, flags=0x7ffe3ee6f8e8)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1084
#25 0x000055cd66161792 in pyrun_simple_file (flags=0x7ffe3ee6f8e8, closeit=1, filename=0x7f7f391bbee0, fp=0x55cd66c2cdf0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:439
#26 PyRun_SimpleFileExFlags (fp=0x55cd66c2cdf0, filename=<optimized out>, closeit=1, flags=0x7ffe3ee6f8e8)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:472
#27 0x000055cd66161d0d in pymain_run_file (cf=0x7ffe3ee6f8e8, config=0x55cd66c2da70)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:391
#28 pymain_run_python (exitcode=0x7ffe3ee6f8e0)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:616
#29 Py_RunMain () at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:695
#30 0x000055cd66161ec9 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:1127
#31 0x00007f7f3a3620b3 in __libc_start_main (main=0x55cd65fe3490 <main>, argc=2, argv=0x7ffe3ee6fae8, init=<optimized out>, 
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe3ee6fad8) at ../csu/libc-start.c:308
#32 0x000055cd660d7369 in _start () at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ast.c:937

使用的conda环境如下,使用Miniconda3-py38_4.9.2.Linux-x86_64.sh(请注意,segfault有时会在conda环境的设置过程中发生,因此它可能与env无关(

name: eo
channels:
- conda-forge
- defaults
dependencies:
- python=3.8.8
- pip=20.3.1
- pip:
- transformers==4.3.2
- tensorflow_gpu==2.4.0
- scikit-learn==0.23.2
- nltk==3.5
- matplotlib==3.2.1
- seaborn==0.11.0
- tensorflow-addons==0.11.2
- tf-models-official==2.4.0
- gspread==3.6.0
- oauth2client==4.1.3
- ipykernel==5.4.2
- autopep8==1.5.4
- torch==1.7.1

下面的代码一致地再现了这个问题,读取的文件是包含unicode文本的简单文本文件:

from nltk.tokenize import wordpunct_tokenize
from tensorflow.keras.preprocessing.text import Tokenizer
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords
import pickle
from pathlib import Path
import faulthandler
faulthandler.enable()

def load_data(root_path, feature, index):
feature_root = root_path / feature
dir1 = str(index // 10_000)
base_path = feature_root / dir1 / str(index)
full_path = base_path.with_suffix('.txt')
data = None
with open(full_path, 'r', encoding='utf-8') as f:
data = f.read()
return data

def preprocess_string(text, stemmer, stop_words):
word_tokens = wordpunct_tokenize(text.lower())
alpha_tokens = []
for w in word_tokens:
try:
if (w.isalpha() and w not in stop_words):
alpha_tokens.append(w)
except:
print("Something went wrong when handling the word: ", w)
clean_tokens = []
for w in alpha_tokens:
try:
word = stemmer.stem(w)
clean_tokens.append(word)
except:
print("Something went wrong when stemming the word: ", w)
clean_tokens.append(w)
return clean_tokens

stop_words = stopwords.words('english')
stemmer = SnowballStemmer(language='english')
tokenizer = Tokenizer()
root_path = '/srv/patent/EbbaOtto/E'
for idx in range(0, 57454):
print(f'Processed {idx}/57454', end='r')
desc = str(load_data(Path(root_path), 'clean_description', idx))
desc = preprocess_string(desc, stemmer, stop_words)
tokenizer.fit_on_texts([desc])

为了任何搜索类似问题的人。这最终被解决为CPU中的硬件故障。用另一个品牌相同的CPU更换CPU消除了问题。有趣的是,这个问题并没有出现在windows电脑上。

相关内容

  • 没有找到相关文章

最新更新