如何给to_excel()在熊猫一个参数内改变循环?



大家好。

这是我的第一个问题,所以我会尽量问得彻底一点。

今天,我试图将一系列具有许多工作表的大型excel文档转换为一个巨大的数据集,并对数据显示方式进行了一些更改。需要说明的是,我正在处理数百张表格,所以我正在寻找优化解决方案的方法。

具体来说,我有一个需要从大型excel电子表格中提取的数据表名称列表。然后,我尝试遍历这个列表,并将编辑后的列表作为新文件保存在我的jupyter笔记本的文件夹中。

我希望这样:

list_of_tables = [a,b,c,d,e]
for i in range (0, len(list_of_tables):
df = pd.read_excel (r'Large_dataset_X.xlsx', sheet_name=list_of_tables[i])
{Bunch of code formatting and editing the file}
arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'"
df.to_excel(arg, sheet_name= list_of_tables[i], index = False)

问题是,当我执行这个循环时,to_excel()参数包含一个路径,吐出一个错误:'engine cannot recognize files of .xlsx'

Error Traceback:
---------------------------------------------------------------------------
OptionError                               Traceback (most recent call last)
~anaconda3libsite-packagespandasioexcel_base.py in __new__(cls, path, engine, **kwargs)
632                 try:
--> 633                     engine = config.get_option(f"io.excel.{ext}.writer")
634                     if engine == "auto":
~anaconda3libsite-packagespandas_configconfig.py in __call__(self, *args, **kwds)
232     def __call__(self, *args, **kwds):
--> 233         return self.__func__(*args, **kwds)
234 
~anaconda3libsite-packagespandas_configconfig.py in _get_option(pat, silent)
104 def _get_option(pat: str, silent: bool = False):
--> 105     key = _get_single_key(pat, silent)
106 
~anaconda3libsite-packagespandas_configconfig.py in _get_single_key(pat, silent)
90             _warn_if_deprecated(pat)
---> 91         raise OptionError(f"No such keys(s): {repr(pat)}")
92     if len(keys) > 1:
OptionError: 'No such keys(s): "io.excel.xlsx'.writer"'
The above exception was the direct cause of the following exception:
ValueError                                Traceback (most recent call last)
<ipython-input-24-ec2db8d02335> in <module>
1 arg = "r'Edited Tables/" + list_of_tables[key] + "_Table_New.xlsx'"
2 # print(arg)
----> 3 df.to_excel(arg, sheet_name= list_of_tables[key], index = False)
~anaconda3libsite-packagespandascoregeneric.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
2024             inf_rep=inf_rep,
2025         )
-> 2026         formatter.write(
2027             excel_writer,
2028             sheet_name=sheet_name,
~anaconda3libsite-packagespandasioformatsexcel.py in write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine)
728             need_save = False
729         else:
--> 730             writer = ExcelWriter(stringify_path(writer), engine=engine)
731             need_save = True
732 
~anaconda3libsite-packagespandasioexcel_base.py in __new__(cls, path, engine, **kwargs)
635                         engine = _get_default_writer(ext)
636                 except KeyError as err:
--> 637                     raise ValueError(f"No engine for filetype: '{ext}'") from err
638             cls = get_writer(engine)
639 
ValueError: No engine for filetype: 'xlsx''

我尝试切换到csv格式,错误仍然存在。不知道哪里出了问题。谢谢!

arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'"行末尾有一个额外的撇号。

改为:

arg = r"Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx"

请注意,错误信息是说ValueError: No engine for filetype: 'xlsx'',因为它不知道如何处理xlsx'文件,但如果没有尾随的撇号,xlsx就可以了。

"r'Edited Tables/"部分也有类似的问题。

最新更新