如何在csv中使用split,并使用panda读取文件并为nrows编写文件



如何将列值拆分为多列

import pandas as pd
dataframe = pd.read_csv(r"logon.csv", nrows= 9128)
dataframe[["place","room","computer_number"]] = dataframe["location"].str.split("-",expand=True, nrows=9128)
dataframe.drop(["location"], axis=1, inplace= True)
dataframe.to_csv("logon.csv", index= False)

错误:

File "C:Python310libsite-packagespandasioparsersc_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas_libsparsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas_libsparsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas_libsparsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas_libsparsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 278, saw 7

您看到的错误来自csv解析器。这只是意味着文件logon.csv中的一行有额外的字段(7个字段,而不是标题中的6个(。你的csv文件有标题行吗?

例如,考虑此文件:

id,date,location
1,1-1-2020,ND-1-34
2,1-1-2020,NY-1-32
3,1-1-2020,NF-1-34
4,1-1-2020,ID-3-14
5,1-1-2020,OD-1-34
6,1-1-2020,NX-5-38
7,1-1-2020,NC-1-94
8,1-1-2020,AD-9-30
9,1-1-2020,NX-5-38
10,1-1-2020,NC-1-94
11,1-1-2020,ID-3-14
12,1-1-2020,OD-1-34

pd.read_csv(r"logon.csv", nrows=10)正确解析

但是这个csv文件将使用相同的python代码失败,因为第5行有一个额外的字段。

id,date,location
1,1-1-2020,ND-1-34
2,1-1-2020,NY-1-32
3,1-1-2020,NF-1-34
4,1-1-2020,ID-3-14,1
5,1-1-2020,OD-1-34
6,1-1-2020,NX-5-38
7,1-1-2020,NC-1-94
8,1-1-2020,AD-9-30
9,1-1-2020,NX-5-38
10,1-1-2020,NC-1-94
11,1-1-2020,ID-3-14
12,1-1-2020,OD-1-34

出现错误pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 5, saw 4

最新更新