如何在python 3中从pdf中读取泰卢固语表项



我使用以下源代码从pdf获取英文表数据。但我无法获取其他语言。谁能帮助我如何传递语言参数并获取任何类型的语言

from tabula import read_pdf
url = "/Users/administrator/Desktop/Telugu_land_document1.pdf"
try:
df = read_pdf(url, pages='all')
print(df)
except Exception as e:
print(e)

您可以配置坐标信息,使其与语言无关。

df = tabula.read_pdf_with_template("/path/xxx.pdf", "path/temp.json")

# cat path/temp.json

[
{
"page":1,
"extraction_method":"a",
"x1":157.18,
"x2":1111.41,
"y1":270.97,
"y2":283,
"width":954.23,
"height":11.189
},
{
"page":1,
"extraction_method":"a",
"x1":157.18,
"x2":1111.41,
"y1":270.97,
"y2":283,
"width":954.23,
"height":11.189
}
...
]

最新更新