从azure Form Recognizer的JSON文件中可视化表数据



我正在使用Azure表单识别器来自动化一些数据收集。因此,它读取PDF格式的表格并生成一个JSON文件。(文件如下)。因此,我真的在寻找一些关于如何将JSON文件转换回表的想法(我知道这听起来有点循环——但我需要提取1列,例如,2019年第二季度的数据,并构建一个时间序列)。我考虑过使用excel,但边界框信息刚刚列出,我不知道如何"表"它。Azure产品的PDF页面和JSON输出的图像。还有关于我走了多远的excel截图。Excel截图数据获取"JSON"RAW表格数据PDF

{"status":"success","pages":[{:1.0},{"text":"2019年第二季度","boundingBox":[408.6772.1431.7772.1431.7.766.5408.6766.5],"信心":1.0},{,{"text":"Q2","boundingBox":[178.6,768.6186.6768.6186,6763.0178.6763.0],"置信度":1.0},{"text":"Q1","boundingBox":[21.82768.6226.2768.6226.2763.0218.2763.0],{"text":"Q3","boundingBox":[297.3,768.6305.3768.6305.3.763.0297.3763.0],"置信度":1.0},{"text:"Q2,"bouddingBox":[336.9876.8344.9768.6344.9763.0336.9763.0],"置信度":1.0},{,{"text":"与。","boundingBox":[376.6,765.2384.3765.2384.3.759.5376.6759.5],"置信度":1.0},{"text":"vs.","boundingBox":[462.765.2423.9765.2423.9 759.5416.2759.5]"置信度:1.0}{"text":"2019年上半年","boundingBox":[450.7,765.2473.4765.2473.4.759.5450.7759.5]boundingBox":[495.57652.518.2765.2518.2759.5495.5759.5],"置信度":1.0},{"文本":"vs.","boundingBox":[543.57652.551.1765.2551.1759.5543.5759.5%],"置信度":1.0},{,{"text":"2019","boundingBox":【215.6761.8229.0761.8229.0,756.2,215.6756.2】,"置信度":1.0},{"text":"2018","boundingBox":【255.1,761.8268.5761.8268.5.756.2,255.1756.2】,{"text":"2018","boundingBox":[343.37661.347.7761.8347.7756.2334.3756.2],"置信度":1.0},{"text":"2019年第一季度","boundingBox":[368.9758.392.1758.392.1752.7368.9752.7],"置信度":1.0},{,{"text":"2018年上半年","boundingBox":[535.9758.3558.7758.3558.7752.7535.9752.7],"置信度":1.0},{"text":"Metallurgical Coal(Australia)","boundingBox":【59.8747.341.9747.3141.9741.659.8741..6],"置信度":1.0},{,{"text":"4156200","boundingBox":[212.6747.3239.3747.3239.3.741.6212.6741.6],"confidence":1.0},{"text":"5647100",{"text":"5261900","boundingBox":[331.2474.3358.0747.3358.0741.6331.2741.6],"置信度":1.0},{,{"text":"9999700","boundingBox":[452.2477.3478.9747.3478.9741.6452.2741.6],"置信度":1.0},{,{"text":"Hard焦煤","boundingBox":[63.8733.2111.2173.2111.227.663.8727.6],"置信度":1.0},{,{"text":"4864600","boundingBox":[252.1、733.2、278.8、733.2和278.8、727.6、252.1、727.6],"置信度":1.0},{"text:"4545800,{"text":"52%","boundingBox":[382.9732.394.9733.2394.9727.6382.9727.6],"置信度":1.0},{"text":"9%","boundingBox":[425.8732.434.5733.2434.5727.6425.8727.6]

有多种方法可以做到这一点。基本上,您需要的是一个JSONPath(https://jsonpath.com/)您将在其中指定您感兴趣的json部分:

更多可用选项:是否有与XQuery/XPath等效的JSON?

最新更新