如何使用OpenPyXL迭代Excel表中的所有行?



OpenPyXL for Excel Tables的文档没有提到如何迭代表中的值(见这里)。

做这件事的有效方法是什么?

我想出了下面的函数来做这件事。

from typing import Any, Dict, Generator
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table
from openpyxl.worksheet.worksheet import Worksheet
from openpyxl.utils import rows_from_range
TableRow = Dict[str, Any]
def iter_table_rows(ws: Worksheet, tb: Table) -> Generator[TableRow, None, None]:
"""Iterate over rows from a table with headers (row as dictionary)"""
def get_row_values(row_cell_ref):
return [ws[c].value for c in row_cell_ref]

iter_rows = rows_from_range(tb.ref)
headers = get_row_values(next(iter_rows))

for row_cells in iter_rows:
yield {h:v for h,v in zip(headers, get_row_values(row_cells))}

wb = load_workbook("my_file.xlsx")
ws = wb.active
tb = ws.tables["MyTable"]
for row in iter_table_rows(ws, tb):
print(row)

感谢您的功能。我添加了一个ws参数,因为ws不是一个全局变量,而且它是get_row_values所必需的。

def iter_table_rows(ws:Worksheet, tb:Table) -> Generator[TableRow, None, None]:
"""Iterate over rows from a table with headers (row as dictionary)"""
def get_row_values(ws:Worksheet,row_cell_ref:tuple):
return [ws[c].value for c in row_cell_ref]
iter_rows = rows_from_range(tb.ref)
headers = get_row_values(ws,next(iter_rows))
for row_cells in iter_rows:
yield {h:v for h,v in zip(headers, get_row_values(ws,row_cells))}

目前我还没有别的办法。
没有"parent"属性以知道在哪个工作表

最后我得出了一个不同的解决方案。
下面是一个从json文件中的xlsx表中提取数据的函数代码。
我猜它更接近openpyxl的哲学。数据存储在工作表中,表似乎是一种指向范围的指针。在工作表中读取数据的最有效方法是使用ws。Iter_row根据min_row、max_row、min_col、max_col指定范围。

import json
import logging
from argparse import Namespace
from openpyxl import load_workbook
from openpyxl.worksheet.cell_range import CellRange
def xlsx_Json(args: Namespace):
"""Extract table values from an excel workbook in a json file
Args:
args (Namespace): parsed arguments
args.workbook   : filename of the excel document
args.worksheet  : name of the sheet to read
args.table      : name of a table in worksheet
args.output     : name of a output file (json format)
Note:
It's assumed that table exists in worksheet, and worksheet exists in workbook, and file exists
"""
#wb = load_workbook(filename=args.workbook,read_only=True)  !!! read only worksheet doesn't have table ?
wb = load_workbook(filename=args.workbook)
ws = wb[args.worksheet]
table = ws.tables[args.table]
# table.ref needs to be converted in a CellRange, and use the range to read data in the worksheet
table_range = CellRange(table.ref)
tableJson={ args.table : [] }
header=table.column_names
for nb_rows,row in enumerate(ws.iter_rows(min_row=table_range.min_row+1, 
min_col=table_range.min_col, 
max_row=table_range.max_row, 
max_col=table_range.max_col, 
values_only=True )):
#   min_row=range.min_row+1  skip the header row

#   need to convert row element to string to make a serializablejson object
#   value such datetime are not directly serializable
if nb_rows == 0:
for h,x in zip(header,row):
logging.info(f"{h}:type-{type(x)}:value:{x}")
if nb_rows % 100 == 0:
# print row number every 100 lines
logging.info(f"rows:{nb_rows}")
row_Json=dict(zip(header,[str(x) for x in row]))        
tableJson[args.table].append(row_Json)
with open(args.json_output,"w") as file_out:
json.dump(tableJson,file_out,indent=4)
logging.info(f"{nb_rows} elements written in {args.json_output}")

最新更新