AWS使用textract启动文档分析不起作用



我正在为我的学校做一个项目,我应该使用textract对表格进行文档分析,并将输出运行到A2I,在那里算法将确定表格是否被批准、拒绝或需要审查。一旦文档上传到S3,就应该触发这个textract lambda函数。然而,当我遵循此文档时,我会遇到语法错误;https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentAnalysis.html

我的代码是:

import urllib.parse
import boto3
print('Loading function')
##Clients
s3 = boto3.client('s3')
textract = boto3.client('textract')
def analyzedata(bucketName,documentKey):
print("Loading")
AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": { 
"S3Object": { 
"Bucket": "bucketName",
"Name": "documentKey",
})
detectedText = ''
# Print detected text
for item in AnalyzedData['Blocks']:
if item['BlockType'] == 'LINE':
detectedText += item['Text'] + 'n'

return detectedText

def writeTextractToS3File(textractData, bucketName, createdS3Document):
print('Loading writeTextractToS3File')
generateFilePath = os.path.splitext(createdS3Document)[0] + '.csv'
s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
print('Generated ' + generateFilePath)


def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
detectedText = analyzedata(bucket, key)
writeTextractToS3File(detectedText, bucket, key)

return 'Processing Done!'



except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e

代码还没有完成,但我已经收到语法错误:

"errorMessage": "Syntax error in module 'lambda_function': invalid syntax (lambda_function.py, line 13)",
"errorType": "Runtime.UserCodeSyntaxError",
"stackTrace": [
"  File "/var/task/lambda_function.py" Line 13n        AnalyzedData= textract.Start_Document_Analysis("DocumentLocation": { n"
]
}

根据boto3文档,您的语法应该更像:

AnalyzedData= textract.start_document_analysis(DocumentLocation={ 
"S3Object": { 
"Bucket": "bucketName",
"Name": "documentKey",
})

还要注意,FeatureTypes参数是根据需要列出的。

您应该尝试pip安装awscli

pip install awscli

或者pip3,如果效果更好的话

然后导入并尝试运行代码。

我认为您缺少一个开始的花括号字符。

AnalyzedData= textract.StartDocumentAnalysis("DocumentLocation": { # missing { in this line
"S3Object": { 
"Bucket": "bucketName",
"Name": "documentKey",
})

最新更新