熊猫:CSV 标头和数据行大小不匹配



是否可以指示熊猫忽略哪些位置超过标题大小的列?

import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,An")
csv_file.write("2018-10-09 18:00:07, 123n")
df = pandas.read_csv('test.csv')
print(df)

给出答案:

datetime    A
0  2018-10-09 18:00:07  123

但是,加载具有更多数据列的CSV文件,该列在标头中定义:

with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,An")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZn")
df = pandas.read_csv('test.csv')
print(df)

返回:

datetime     A
2018-10-09 18:00:07 123      ABC   XYZ

Pandas 将标题移动到数据的最右侧位置。

我需要不同的行为。我希望熊猫忽略超出标题的数据行。

注意:我无法枚举列,因为它是一个通用用例。由于一些独立于我的代码的原因,有时会有更多的数据,这是预期的。我想忽略额外的数据。

Pandas 似乎意识到与实际标题相比有太多的列,并且它假设前两个(数据(列是(多(索引。

使用read_csv中的usecols参数指定要读取的数据列:

import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,An")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZn")
df = pandas.read_csv('test.csv', usecols=[0,1]) 
print(df)

收益 率

datetime    A
0  2018-10-09 18:00:07  123

现在代码显示了问题的答案。

with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,An")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZn")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 1
colCount = headerCount
elif i == 1:
dataCount = line.count(",") + 1  
elif i > 1:
break
if (headerCount < dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
colCount=headerCount
df = pandas.read_csv('test.csv', usecols=range(colCount))
print(df)

生产:

Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime    A
0  2018-10-09 18:00:07  123

为了使问题完整,这是可以解决问题的代码:

with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A, B, Cn")
csv_file.write("2018-10-09 18:00:07, 123n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 2
elif i == 1:
dataCount = line.count(",") + 2  
if (headerCount != dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
elif i > 1:
break

df = pandas.read_csv('test.csv', usecols=range(dataCount-1))
print(df)

给出正确的熊猫对象。

Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime    A
0  2018-10-09 18:00:07  123

最新更新