TypeError:__init__()接受1个位置参数,但给定了2个(使用Pytesseract的Python多处理)



当我尝试将Python的multiprocessing库与pytesseractpdf2image一起使用时,收到了下面的错误消息,我不太确定它的含义或如何更正它。我看到的其他带有类似输出消息的帖子处理了在类的方法中传递self作为参数的问题,但我还没有在这个实例中创建类。

C:Userserik7>python "C:Userserik7DocumentsPython Projectsmultiprocess_test2.py"
0
Exception in thread Thread-11:
Traceback (most recent call last):
File "C:Userserik7AppDataLocalProgramsPythonPython38-32libthreading.py", line 932, in _bootstrap_inner
self.run()
File "C:Userserik7AppDataLocalProgramsPythonPython38-32libthreading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:Userserik7AppDataLocalProgramsPythonPython38-32libmultiprocessingpool.py", line 576, in _handle_results
task = get()
File "C:Userserik7AppDataLocalProgramsPythonPython38-32libmultiprocessingconnection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() takes 1 positional argument but 2 were given
1
2
3
4
5
6
7
8
9

我的代码:

import pytesseract
import pdf2image
import multiprocessing

def extract(img, page_num):

print(page_num)

return pytesseract.image_to_osd(img, output_type = pytesseract.Output.DICT)['orientaton']

if __name__ == "__main__":
pdf_path = r"C:/Users/erik7/Documents/Late Scans for Testing/scans_template2.pdf"
output_fmt = 'jpeg'
img_dpi = 300
pop_path = r"C:Userserik7Downloadspoppler-0.90.1bin"
output_path = r"C:Userserik7Downloads"

pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCRtesseract.exe"

converted_path = r"C:Userserik7Downloadsconverted_images"
converted = pdf2image.convert_from_path(pdf_path = pdf_path, fmt = output_fmt, dpi = img_dpi, poppler_path = pop_path, output_folder = converted_path, grayscale = True, thread_count = 2)
results = [] 

iterable = [[img, page_num] for page_num, img in enumerate(converted)]
p = multiprocessing.Pool()
r = p.starmap(extract, iterable)
results.append(r)
p.close()

print("n**PROCESS COMPLETED SUCCESSFULLY")

工作正常。我需要将pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCRtesseract.exe"移动到我的extract函数中,这样程序才能使用multiprocessing:成功运行

import pytesseract
import pdf2image
import multiprocessing

def extract(img, page_num):

print(page_num)

return pytesseract.image_to_osd(img, output_type = pytesseract.Output.DICT)['orientaton']

if __name__ == "__main__":
pdf_path = r"C:/Users/erik7/Documents/Late Scans for Testing/scans_template2.pdf"
output_fmt = 'jpeg'
img_dpi = 300
pop_path = r"C:Userserik7Downloadspoppler-0.90.1bin"
output_path = r"C:Userserik7Downloads"

converted_path = r"C:Userserik7Downloadsconverted_images"
converted = pdf2image.convert_from_path(pdf_path = pdf_path, fmt = output_fmt, dpi = img_dpi, poppler_path = pop_path, output_folder = converted_path, grayscale = True, thread_count = 2)
results = [] 

iterable = [[img, page_num] for page_num, img in enumerate(converted)]
p = multiprocessing.Pool()
r = p.starmap(extract, iterable)
results.append(r)
p.close()

print("n**PROCESS COMPLETED SUCCESSFULLY")

相关内容

  • 没有找到相关文章

最新更新