当运行Python脚本时,我在一个工作站上得到一个错误。该脚本在vm和我的工作站上运行良好。
pip list
显示包是相同的- 工作站都使用Python 3.10.4 64位
- 这是唯一抛出此错误的工作站。
这可能是内存问题,但工作站有2x4Gb RAM。我试着把它分出来,但也没用。文件只有1Mb。
作为故障排除,我将文件缩减到只有500行,它运行得很好。当我在文件中的2500行中尝试1000行时,它给出了相同的错误。有趣的是,工作站现在甚至不能运行只有一行的脚本。
包括error_bad_lines=False
、iterator=True
、chunksize=
、low_memory=False
均不工作
是什么导致这个错误?为什么它在使用几行时运行得很好,但现在甚至只有一行?
这是Traceback:
Traceback (most recent call last):
File "c:Usersscript.py", line 5, in <module>
data = pd.read_csv("C:/Path/file.csv", encoding='latin-1' )
File "C:UsersAppDataLocalProgramsPythonPython310libsite-packagespandasutil_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:UsersAppDataLocalProgramsPythonPython310libsite-packagespandasioparsersreaders.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:UsersAppDataLocalProgramsPythonPython310libsite-packagespandasioparsersreaders.py", line 581, in _read
return parser.read(nrows)
File "C:UsersAppDataLocalProgramsPythonPython310libsite-packagespandasioparsersreaders.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:UsersAppDataLocalProgramsPythonPython310libsite-packagespandasioparsersc_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas_libsparsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas_libsparsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas_libsparsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas_libsparsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 4
脚本如下:
# Import raw data
data = pd.read_csv("C:/Users/Script.csv", encoding='latin-1' )
# Create array to track failed cases.
data['Test Case Failed']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))
# Testcase 1
data_1 = data[(data['FirstName'] == data['SRFirstName'])]
ids = data_1.index.tolist()
for i in ids:
data.at[i,'Test Case Failed']+=', 1'
# There are 15 more test cases that preform similar tasks
# Total cases
failed = data[(data['Test Case Failed'] != '')]
passed = data[(data['Test Case Failed'] == '')]
failed['Test Case Failed'] =failed['Test Case Failed'].str[1:]
failed = failed[(failed['Test Case Failed'] != '')]
# Clean up
del failed["ID"]
del passed["ID"]
# Print results
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There was" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")
# Drop unwanted columns
redata = passed.drop(columns=['ConsCodeImpID', 'ImportID', 'Suff1', 'SRSuff2', 'Inactive',
'AddrRegion','AddrImpID', 'AddrImpID', 'AddrImpID.2', 'AddrImpID.1', 'PhoneAddrImpID',
'PhoneAddrImpID.1', 'PhoneImpID', 'PhoneAddrImpID', 'PhoneImpID', 'PhoneType.1', 'DateTo',
'SecondID', 'Test Case Failed', 'PhoneImpID.1'])
# Clean address
redata['AddrLines'] = redata['AddrLines'].str.replace('Apartment ','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('Apt\.','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('APT','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('nApt','Apt ',regex=True)
#There's about 100 more rows of address clean up
# Output edited dropped columns
redata.to_csv("C:/Users/cleandata.csv", index = False)
# Output failed rows
failed.to_csv("C:/Users/Failed.csv", index = False)
# Output passed rows
passed.to_csv("C:/Users/Passed.csv", index = False)
工作站正在破坏文件,尽管在运行脚本之前从未打开过它。我修复了文件,它工作了。在重新安装Excel后,我不再需要修复文件,可以正常运行脚本。
- 点击文件>开放。
- 单击包含损坏工作簿的位置和文件夹。
- 在"打开"对话框中,选择损坏的工作簿。
- 点击打开按钮旁边的箭头,然后点击打开和
修复。 打开和修复命令 - 要恢复尽可能多的工作簿数据,请选择修复。
- 如果修复无法恢复您的数据,选择提取数据到从工作簿中提取值和公式。