从url响应中读取带有pandas的Excel文件

我正在寻找如何读取一个xlsx文件与熊猫，与文件托管在SharePoint。这些内容，当通过响应显示时。

PK╚╝!#h ` `` ` [Content_Types].xml ` ` (` `M O��[@��,��1��克%{�΀��还是z E��Ҧݝ�我{��5�╗"j��"╚W��我! ^ _Z�"CR R��,�P"g"U"a!"D"FJ">"㕱"V?"Ɖ* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *-"*_"- k"""= = ""BndF0x"3UΕ"Nu"罗P��Yğ#��,g�K #�}��E =道明你��}��O�问�[��F�|第九��╚��[H2{�H +╚x K��]dn��╔yZ" N jͺ��"ih�年代�Gn�& lt; j�╚

我想知道如何将这种格式读入内存，以便我可以调用pd。Read_excel with it.

我尝试使用urllib和openpyxl，以这种方式:

import openpyxl as excel
import pandas as pd
from io import BytesIO
import urllib
req = urllib.request.Request(url=url, data=payload, headers=headers)
with urllib.request.urlopen(url=req) as reponse:
rsp = reponse.read()
excel.load_workbook(filename=rsp)

但是我从urllib请求模块得到400个错误请求。

url看起来像这样:

https://company.sharepoint.com/sites/test-department/_api/Web/GetFileByServerRelativeUrl('/网站/测试部门/DepartmentDocuments/测试/Book1.xlsx ')/美元价值? binaryStringResponseBody = true

我找到了一个方法。关键是在将文件传递给pandas之前进行回寻。

file_ext = self.file_name.split('.')[-1]
if file_ext == 'xlsx':
import pandas as pd
from io import BytesIO
xl = bytes(memoryview(response.content))
memfile =BytesIO()
memfile.write(xl)
memfile.seek(0)
df = pd.io.excel.read_excel(memfile, engine='openpyxl')
print(df.head(10))

相关内容

最新更新

热门标签：