在OpenPyXl中仅阅读工作簿中的列迭代

我有一个大的.xlsx文件-19列，5185行。我想打开文件，读取一列中的所有值，对这些值进行一些操作，然后在同一工作簿中创建一个新列，然后写出修改后的值。因此，我需要能够在同一文件中读写。

我的原始代码做到了这一点：

def readExcel(doc):
    wb = load_workbook(generalpath + exppath + doc)
    ws = wb["Sheet1"]
    # iterate through the columns to find the correct one
    for col in ws.iter_cols(min_row=1, max_row=1):
        for mycell in col:
            if mycell.value == "PerceivedSound.RESP":
                origCol = mycell.column
    # get the column letter for the first empty column to output the new values
    newCol = utils.get_column_letter(ws.max_column+1)
    # iterate through the rows to get the value from the original column,
    # do something to that value, and output it in the new column
    for myrow in range(2, ws.max_row+1):
        myrow = str(myrow)
        # do some stuff to make the new value
        cleanedResp = doStuff(ws[origCol + myrow].value)
        ws[newCol + myrow] = cleanedResp
    wb.save(doc)

但是，Python在第3853行之后丢下了记忆错误，因为工作簿太大了。OpenPyxl文档据说使用仅阅读模式（https://openpyxl.readthedocs.io/en/latest/optimized.html）来处理大工作簿。我现在试图使用它；但是，当我添加read_only = true param：

时，似乎没有办法通过列迭代。

def readExcel(doc):
    wb = load_workbook(generalpath + exppath + doc, read_only=True)
    ws = wb["Sheet1"]
    for col in ws.iter_cols(min_row=1, max_row=1):
        #etc.

python引发了此错误： attributeError：'readonlyworksheet'对象没有属性'iter_cols'

如果我将上图中的最后一行更改为：

for col in ws.columns:

python引发了相同的错误： attributeError：'readonlyworksheet'对象没有属性'列'

迭代行是可以的（并且包含在上面链接的文档中）：

for col in ws.rows:

（无错误）

这个问题询问了attritubeerror，但解决方案是删除仅阅读模式，这对我不起作用，因为OpenPyXl不会以不阅读的模式阅读我的整个工作簿。

so：如何在大型工作簿中迭代列？

我还没有遇到这个问题，但是我一旦可以遍历这些列：我该如何读写同一工作簿，如果说工作簿很大？

谢谢！

如果工作表只有大约100,000个单元格，则您不应该有任何内存问题。您可能应该进一步研究。

iter_cols()在只读模式下不可用，因为它需要恒定且非常低效的XML文件。但是，使用zip的行从iter_rows()转换为列相对容易。

def _iter_cols(self, min_col=None, max_col=None, min_row=None,
               max_row=None, values_only=False):
    yield from zip(*self.iter_rows(
        min_row=min_row, max_row=max_row,
        min_col=min_col, max_col=max_col, values_only=values_only))
import types
for sheet in workbook:
    sheet.iter_cols = types.MethodType(_iter_cols, sheet)

根据文档，ReadOnly模式仅支持基于行的读取（未实现列读取）。但这并不难解决：

wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()
# find what column I need
colcounter = 0
for row in ws.rows:
    for cell in row:
        if cell.value == "PerceivedSound.RESP":
            break
        colcounter += 1
    
    # cells are apparently linked to the parent workbook meta
    # this will retain only values; you'll need custom
    # row constructor if you want to retain more
    row2 = [cell.value for cell in row]
    ws2.append(row2) # preserve the first row in the new file
    break # stop after first row
for row in ws.rows:
    row2 = [cell.value for cell in row]
    row2.append(doStuff(row2[colcounter]))
    ws2.append(row2) # write a new row to the new wb
    
wb2.save('newfile.xlsx')
wb.close()
wb2.close()
# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()

您将无法写入同一工作簿，但是如上所示，您可以打开一个新的工作簿（以写入模式），写入它，并使用OS副本覆盖旧文件。

相关内容

最新更新

热门标签：