小贝子编程

如何在 Python 中使用带有"-layout"选项的 pdftotext 库

本文关键字：-layout 选项 pdftotext Python python pdftotext
更新时间 : 2023-09-20
英文 : How to use pdftotext library with "-layout" option in Python

我正在使用Python库pdftotext来抓取PDF文件的文本。这很好，但我需要"-布局"；命令行工具随pdftotext -layout pdf_file.pdf提供的选项。不确定如果不必在代码中显式使用该命令，这是否可行。

实际代码：

pdf = pdftotext.PDF(file)
plain_text = "nn".join(pdf)

具有更好刮取布局选项的理想代码：

pdf = pdftotext.PDF(file, "-layout")
plain_text = "nn".join(pdf)

我想在Python程序中避免的解决方法：

cmd = ['pdftotext', '-f', str(1), '-l', str(1), str(pdf_file), '-layout', '-']

谢谢！

with open("file.pdf", "rb") as f:
pdf=pdftotext.PDF(f,physical=True)
Inside the code found:
"    raw: If True, page text is output in the order it appears in then"
"        content stream.n"
"    physical: If True, page text is output in the order it appears

如何在 Python 中使用带有"-layout"选项的 pdftotext 库

相关内容

最新更新

热门标签：