代码适用于旧版本的Pandas,我已经升级到0.24.0版本,然后它因语法错误而失败。
法典:
pandas_df = pd.read_excel(s3_src_file, usecols=range(int(start_range), int(end_range)), skiprows=1).dropna(how='all')
日志错误:
Traceback (most recent call last):
File "ndf_digital_weekly_budget_src_raw.py", line 191, in <module>
main(sys.argv[1:])
File "ndf_digital_weekly_budget_src_raw.py", line 181, in main
reject_bucket_prefix, raw_hive_table_path, budget_year, xls_read_ranges, weekly_budget_skip_rows)
File "ndf_digital_weekly_budget_src_raw.py", line 77, in read_weekly_budget_file
pandas_df = pd.read_excel(s3_src_file, usecols=list(range(int(start_range), int(end_range))), skiprows=weekly_budget_skip_rows).dropna(how='all')
File "/usr/local/lib64/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib64/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/excel.py", line 405, in __init__
filepath_or_buffer)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 212, in get_filepath_or_buffer
from pandas.io import s3
File "/usr/local/lib64/python2.7/site-packages/pandas/io/s3.py", line 5, in <module>
import s3fs
File "/usr/local/lib/python2.7/site-packages/s3fs/__init__.py", line 1, in <module>
from .core import S3FileSystem, S3File
File "/usr/local/lib/python2.7/site-packages/s3fs/core.py", line 7, in <module>
from fsspec import AbstractFileSystem
File "/usr/local/lib/python2.7/site-packages/fsspec/__init__.py", line 6, in <module>
from .spec import AbstractFileSystem
File "/usr/local/lib/python2.7/site-packages/fsspec/spec.py", line 352
yield path, dirs, files
SyntaxError: 'return' with argument inside generator
尝试将range
更改为list
:
usecols=range(int(start_range), int(end_range))
自:
usecols=list(range(int(start_range), int(end_range)))
原因是如果检查参数usecols
read_csv
:
使用:类似列表或可调用,可选