当我试图在Jupyter笔记本中加载txt时出现问题



我正在尝试从文件夹加载所有txt文件。当我想将txt文件加载到pandas数据帧并连接它们时,下面的代码大部分时间都有效,但在这种情况下不起作用,我不知道为什么。

这是代码:

path = 'C:/Users/user/Documents/UNIAO'

csv_files = glob.glob(os.path.join(path, "*.txt"))
list_of_dataframes = []
# loop over the list of csv files
for f in csv_files:
text_file = open(f, "r", encoding='unicode_escape')
data = text_file.read()
separator= data[4]
df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
list_of_dataframes.append(df)

这是错误消息:

ParserError                               Traceback (most recent call last)
Cell In [5], line 19
16 separator = data[4]
---> 19 df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
20 print(f)
23 list_of_dataframes.append(df)
File c:UsersuserAppDataLocalProgramsPythonPython310libsite-packagespandasutil_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306     warnings.warn(
307         msg.format(arguments=arguments),
308         FutureWarning,
309         stacklevel=stacklevel,
310     )
--> 311 return func(*args, **kwargs)
File c:UsersuserAppDataLocalProgramsPythonPython310libsite-packagespandasioparsersreaders.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666     dialect,
667     delimiter,
(...)
676     defaults={"delimiter": ","},
677 )
...
--> 739     raise ParserError(msg)
740 elif self.on_bad_lines == self.BadLineHandleMethod.WARN:
741     base = f"Skipping line {row_num}: "
ParserError: Expected 197 fields in line 11955, saw 198

这个问题可能是由于11955行上的一些损坏/丢失的数据,您可以尝试,

对于熊猫>=1.3.0

df=pd.read_csv(f,sep=分隔符,编码='unicode_escape',on_bad_lines='kip'(

对于Pandas<1.3.0

df=pd.read_csv(f,sep=分隔符,encoding='unicode_escape',error_bad_lines=False(

请注意,这将导致跳过有问题的行。

有关更多信息,请参阅Pandas文档

最新更新