我正在努力解决以下问题:
我每天收到一个。xls文件。但是,该文件的格式与扩展名不同。有时是。xlsx,有时是。xlsb。我试图将该文件的内容读取到pandas数据框架中,因此我编写了以下代码:
import openpyxl
import pyxlsb
import pyodbc
import os
import pandas as pd
try:
full_name1 = "my_file.xls"
full_name2 = "my_file.xlsx"
os.rename(full_name1, full_name2)
df = pd.read_excel(full_name2, sheet_name = 'Sheet 2', engine = 'openpyxl', skiprows = range(0, 7), usecols = "A:X")
except:
full_name1 = "my_file.xlsx"
full_name2 = "my_file.xlsb"
os.rename(full_name1, full_name2)
df = pd.read_excel(full_name2, sheet_name = 'Sheet 2', engine = 'pyxlsb', skiprows = range(0, 7), usecols = "A:X")
但是,我得到错误:
[WinError 32]进程无法访问文件,因为它正在被另一个进程使用:'my_file.xlsx' ->"my_file.xlsb">
和'my_file.xlsx'似乎仍在使用。
有办法解决这个问题吗?
为什么不遍历目录中的文件列表呢?:
import os
for directory, subdirlist, filelist in os.walk('C:\Google Drive\Python\Uhlmann\'):
for file in filelist:
df = pd.read_excel(file, sheet_name = 'Sheet 2', engine = 'openpyxl', skiprows = range(0, 7), usecols = "A:X")
我最终找到了一个解决方案-它在程序的运行时增加了一点,但它可以工作:
import openpyxl
import pyxlsb
import pyodbc
import os
import pandas as pd
import shutil
full_name1 = "my_file.xls"
full_name2 = "my_file.xlsx"
full_name3 = "my_file.xlsb"
shutil.copy(full_name1, full_name3)
os.rename(full_name1, full_name2)
try:
df = pd.read_excel(full_name2, sheet_name = 'Sheet 2', engine = 'openpyxl', skiprows = range(0, 7), usecols = "A:X")
except:
df = pd.read_excel(full_name3, sheet_name = 'Sheet 2', engine = 'pyxlsb', skiprows = range(0, 7), usecols = "A:X")