我已经搜索了相关的问题,但没有找到。
下面是我试过的工作代码:
import json
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient, FormTrainingClient
from azure.core.credentials import AzureKeyCredential
credentials = json.load(open("creds.json"))
API_KEY = credentials["API_KEY"]
ENDPOINT = credentials["ENDPOINT"]
url = "https://some_pdf_url_which_contains_tables.pdf" #or image url which contains
#table
form_recognizer_client = FormRecognizerClient(ENDPOINT, AzureKeyCredential(API_KEY))
poller = form_recognizer_client.begin_recognize_content_from_url(url)
form_data = poller.result()
for page in form_data:
for table in page.tables:
for cell in table.cells:
for item in cell.text:
print(item)
## But I need table in dictionary format with header names in keys and
## values in values.
我希望我能得到一些帮助。谢谢你。根据python Azure表单识别器文档,您可以使用'to_dict'方法。
result_table = form_data.tables[0].to_dict()
然后你可以在字典中循环。
我希望它能帮助你!