如何在Python中使用openpyxl库迭代多个excel表?



我使用Openpyxl库读取xlsx文件并提取少量内容,并在我的txt输出文件中添加更多字符串。我目前使用的excel文件包含名称摘要和员工表。我下面的代码是工作很好为我目前的excel文件。现在的问题是,我会使用相同的代码来读取另一个excel文件,其中包含更多的表,其表名我不确定。在我的代码行ws = wb['Employee']中。工作表名称将随时更改。然而,我确信的一件事是我不想从sheet1中读取任何数据。所有的数据提取将从所有xlsx文件中的sheet2开始进行。我不知道如何从这里着手,所以任何帮助都将不胜感激。

提前感谢您的时间和努力!

代码:

from openpyxl import load_workbook
data_file='\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here

mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)

for i in range(6, ws.max_row+1):  
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)



theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "

outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "n")
outputFile.close() #Closing file

基于SO的更新代码用户注释:

from openpyxl import load_workbook
data_file='\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here

mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:   
for i in range(6, ws.max_row+1):  
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)



theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "

outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "n")
outputFile.close() #Closing file

Excel数据

<Sheet 1 Name -> Summary Sheet: Empty
<Sheet 2 Name -> Employee Sheet 
File Name:  Employee
Sheet Name: Employee
File Type:  csv

Field Name  Type
Name     String
Salary  Numeric
Date    Date
Phone       Int
<Sheet 3 Name->   Employee1 Sheet
File Name:  Employee
Sheet Name: Employee1
File Type:  csv

Field Name  Type
Employee Name   Date
Employee Salary Int
Employment Date Int
Office Phone    Int

要遍历工作簿中的所有工作表并读取其中的数据(除了第一个工作表),请删除ws = wb['Employee']

使用for循环(insert beforefor i in range(5,...)作为this

skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
....

这将读取每个工作表并将数据附加到mylines,除了第一个工作表

第二更新正如您在下面的评论中提到的,要为新的SQL查询添加新行,请进行这些额外的更改

  1. 在字典中添加另一个条目以指示新行,如下所示(注意确保在读取特定工作表中的所有行后执行这些行)编辑String格式,以便一旦看到换行符,就将该字符串写入输出文件。请注意,NewFile布尔值将覆盖现有的任何文件。
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):  
name = ws.cell(row=i, column=1).value
print(i, name)
name1=ws.cell(row=i, column=2).value
print(name1)
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
mylines["Column_name"].append('NextLine')
mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
sheetList = wb.sheetnames
tabIndex = 1
for i in range(len(mylines['Column_name'])):
if(mylines['Column_name'][i] != 'NextLine'):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
theString += ", "
else:
theString = theString[:-2]
if NewFile:
NewFile = False
outputFile = open('output.txt', 'w')  # Text file Output
print("New file  ", theString)
else:
outputFile = open('output.txt', 'a')
print("Not new file  ", theString)
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_" + sheetList[tabIndex] +"({});".format(theString) + "n")
outputFile.close()
tabIndex += 1
theString = " "

最新更新