在 Python 中解析多部分请求字符串



>我有这样的字符串

"--5b34210d81fb44c5a0fdc1a1e5ce42c3rnContent-Disposition: form-data; name="author"rnrnJohn Smithrn--5b34210d81fb44c5a0fdc1a1e5ce42c3rnContent-Disposition: form-data; name="file"; filename="example2.txt"rnContent-Type: text/plainrnExpires: 0rnrnHello Worldrn--5b34210d81fb44c5a0fdc1a1e5ce42c3--rn"

我在其他 vairble 中也有可用的请求标头。

如何使用 Python3 轻松解析它?

我正在通过 API 网关处理 AWS Lambda 中的文件上传,请求正文和标头可通过 Python 字典获得。

关于 stackoverflow 还有其他类似的问题,但大多数都假设使用requests模块或其他模块,并期望请求详细信息采用特定的对象或格式。

注意:我知道可能会让用户上传到 S3 并触发 Lambda,但在这种情况下我有意选择不这样做。

可以使用类似的东西来解析它

from requests_toolbelt.multipart import decoder
multipart_string = "--ce560532019a77d83195f9e9873e16a1rnContent-Disposition: form-data; name="author"rnrnJohn Smithrn--ce560532019a77d83195f9e9873e16a1rnContent-Disposition: form-data; name="file"; filename="example2.txt"rnContent-Type: text/plainrnExpires: 0rnrnHello Worldrn--ce560532019a77d83195f9e9873e16a1--rn"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"
decoder.MultipartDecoder(multipart_string, content_type)

扩展 sam-anthony 的答案(我必须对其进行一些修复才能在 python 3.6.8 上运行(:

from requests_toolbelt.multipart import decoder
multipart_string = b"--ce560532019a77d83195f9e9873e16a1rnContent-Disposition: form-data; name="author"rnrnJohn Smithrn--ce560532019a77d83195f9e9873e16a1rnContent-Disposition: form-data; name="file"; filename="example2.txt"rnContent-Type: text/plainrnExpires: 0rnrnHello Worldrn--ce560532019a77d83195f9e9873e16a1--rn"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"
for part in decoder.MultipartDecoder(multipart_string, content_type).parts:
print(part.text)
John Smith
Hello World

您所要做的就是通过pip install requests-toolbelt --target=安装此库。 然后将其与您的 Lambda 脚本一起上传

下面是一个工作示例:

from requests_toolbelt.multipart import decoder
def lambda_handler(event, context):
content_type_header = event['headers']['Content-Type']
body = event["body"].encode()
response = ''
for part in decoder.MultipartDecoder(body, content_type_header).parts:
response += part.text + "n"
return {
'statusCode': 200,
'body': response
}

这应该足以识别您的依赖项。如果不是,请尝试在 zip 中使用"/python/lib/python3.6/site-packages"文件结构,并在根目录中使用 python 脚本">

如果你想使用Python的CGI,

from cgi import parse_multipart, parse_header
from io import BytesIO
c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'
decoded_string = base64.b64decode(event['body'])
#For Python 3: these two lines of bugfixing are mandatory
#see also: https://stackoverflow.com/questions/31486618/cgi-parse-multipart-function-throws-typeerror-in-python-3
c_data['boundary'] = bytes(c_data['boundary'], "utf-8")
c_data['CONTENT-LENGTH'] = event['headers']['Content-length']
form_data = parse_multipart(BytesIO(decoded_string), c_data)
for image_str in form_data['file']:
...

有一堆奇怪的编码问题,并且 api 网关也有奇怪的行为,最初以字节接收请求的正文,然后在重新部署后开始将它们作为 base64 接收。 无论如何,这是最终为我工作的代码。

import json
import base64
import boto3
from requests_toolbelt.multipart import decoder
s3client = boto3.client("s3")
def lambda_handler(event, context):
content_type_header = event['headers']['content-type']
postdata = base64.b64decode(event['body']).decode('iso-8859-1')
imgInput = ''
lst = []
for part in decoder.MultipartDecoder(postdata.encode('utf-8'), content_type_header).parts:
lst.append(part.text)
response = s3client.put_object(  Body=lst[0].encode('iso-8859-1'),  Bucket='test',    Key='mypicturefinal.jpg')
return {'statusCode': '200','body': 'Success', 'headers': { 'Content-Type': 'text/html' }}

不幸的是,从Python 3.11开始,cgi模块被弃用。

如果您可以使用multipart库(当前的cgi模块文档提到它可以作为可能的替代品(,则可以在 AWS Lambda 函数中使用其parse_form_data()函数,如下所示:

import base64
from io import BytesIO
from multipart import parse_form_data

def lambda_handler(event, context):
"""
Process a HTTP POST request of encoding type "multipart/form-data".
"""
# HTTP headers are case-insensitive
headers = {k.lower():v for k,v in event['headers'].items()}
# AWS API Gateway applies base64 encoding on binary data
body = base64.b64decode(event['body'])
# Parse the multipart form data
environ = {
'CONTENT_LENGTH': headers['content-length'],
'CONTENT_TYPE': headers['content-type'],
'REQUEST_METHOD': 'POST',
'wsgi.input': BytesIO(body)
}
form, files = parse_form_data(environ)
# Example usage...
form_data = dict(form)
logger.info(form_data)
attachments = {key:{
'filename': file.filename,
'content_type': file.content_type,
'size': file.size,
'data': file.raw
} for key,file in files.items()}
logger.info(attachments)

如果使用CGI,我建议使用FieldStorage:

from cgi import FieldStorage
fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']
originalFileName = fs.filename
binaryFileData = fs.file.read()

另请参阅: https://stackoverflow.com/a/38718958/10913265

如果事件正文包含多个文件:

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

提供字段存储对象的列表。所以你可以做到:

for f in fs:
originalFileName = f.filename
binaryFileData = f.file.read()

总的来说,我的解决方案用于处理单个文件以及多个文件以及不包含文件的正文并确保它是多部分/表单数据

from cgi import parse_header, FieldStorage
#see also: https://stackoverflow.com/a/56405982/10913265
c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'
#see also: https://stackoverflow.com/a/38718958/10913265
fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']
#If fs contains a single file or no file: making FieldStorage object to a list, so it gets iterable
if not(type(fs) == list):
fs = [fs]
for f in fs:
originalFileName = f.filename
#no file: 
if originalFileName == '':
continue
binaryFileData = f.file.read()
#Do something with the data 

最新更新