我的excel文件包含头行上面的垃圾值,我想找出行索引位置,其中占用列的最大数量,然后将其分配为头行
没有更多关于标题的信息,我假设它是第一行。
所以这将遍历行,找到数据量最大的第一行(从末尾开始以最少的空单元格计算),并删除该行以上的所有行。
我使用的CSV格式的数据:
some,,,
useless,01/01/70,,
0,garbage,,
according,,to author,
head1,head2,head3,head4
dat1,dat2,dat3,dat4
dat2,dat3,dat4,dat5
dat3,dat4,dat5,dat6
from openpyxl import load_workbook
wb = load_workbook("./garbage.xlsx")
ws = wb.active
data_len = list[int]() # list for data length
for row in ws.iter_rows(values_only=True): # iterate through rows
# print(len(row)) # always the same since this is a table
for i in reversed(range(len(row))): # iterate through cells from end to beginning
if row[i] is not None: # if cell is not empty
data_len.append(i) # save cell position in data length list
break # move to next row
header_index = data_len.index(max(data_len)) # get first index of max data length
ws.delete_rows(0, header_index) # delete from top to header index
for row in ws.iter_rows(values_only=True): # only header and data remain
print(row)
NB。如果数据长度的标准是正确的,它将工作,但绝对不会在生产中使用它。