读取中央目录的具有错误幻数的XLSX文件时出错

目的

从AWS S3下载.xlsx文件到本地目录
将下载的.xlsx文件作为pandas数据帧读取

预信息

macOS，Windows，linux给出相同的错误
Python 3.9和3.8使用
尝试过的方法：
- pd.read_excel(path_，engine=None，header=0，index_col=0(
- pd.read_excel(path_，engine='openpyxl'，header=0，index_col=0(
- xlrd.open_workbook((
- z=ZipFile((

错误跟踪

Traceback (most recent call last):
File "C:UsersUserAppDataLocalProgramsPythonPython39libzipfile.py", line 1257, in __init__
self._RealGetContents()
File "C:UsersUserAppDataLocalProgramsPythonPython39libzipfile.py", line 1352, in _RealGetContents
raise BadZipFile("Bad magic number for central directory")
zipfile.BadZipFile: Bad magic number for central directory

错误线

centdir = fp.read(sizeCentralDir)
if len(centdir) != sizeCentralDir:
raise BadZipFile("Truncated central directory")
centdir = struct.unpack(structCentralDir, centdir)
if centdir[_CD_SIGNATURE] != stringCentralDir:
raise BadZipFile("Bad magic number for central directory")

库版本

熊猫==1.4.3
xlrd==2.0.1
openpyxl==3.0.10

AWS S3服务无法很好地使用excel文件。我认为上传到S3后，文件结构可能已损坏。为了上传和处理Excel文件，我开始将它们转换为拼花地板或csv文件来存储。

相关内容

最新更新

热门标签：