我使用以下源代码从pdf获取英文表数据。但我无法获取其他语言。谁能帮助我如何传递语言参数并获取任何类型的语言
from tabula import read_pdf
url = "/Users/administrator/Desktop/Telugu_land_document1.pdf"
try:
df = read_pdf(url, pages='all')
print(df)
except Exception as e:
print(e)
您可以配置坐标信息,使其与语言无关。
df = tabula.read_pdf_with_template("/path/xxx.pdf", "path/temp.json")
# cat path/temp.json
[
{
"page":1,
"extraction_method":"a",
"x1":157.18,
"x2":1111.41,
"y1":270.97,
"y2":283,
"width":954.23,
"height":11.189
},
{
"page":1,
"extraction_method":"a",
"x1":157.18,
"x2":1111.41,
"y1":270.97,
"y2":283,
"width":954.23,
"height":11.189
}
...
]