列规范不匹配,因此使用pd.read_fwf和使用colspecs读取值错误



我正在使用pd.read_fwf读取一个文本文件,如下所示:

import pandas as pd
specs_test =[(19, 20),(20, 21),(21, 23),(23,26)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
test_l = pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test)

test.txt如下:

11SC059
11SC051
11SC019
11SC033
11SC007
11SC041
22SC079
11SC043
11SC045
22SC079 

读取文件test_l后如下所示:

Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   S   C0  59
1   1   S   C0  51
2   1   S   C0  19
3   1   S   C0  33
4   1   S   C0  7
5   1   S   C0  41
6   2   S   C0  79
7   1   S   C0  43
8   1   S   C0  45
9   2   S   C0  79

但是,根据我的colspec,它应该有以下内容(正如我所期望的,我刚刚添加了第一行(:

1   1  SC  059

我在这里错过了什么?非常感谢你的帮助!

首先,您将被一个索引关闭。尝试:

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]

此外,对于数值,前导零将被忽略。为了保存它们,您可以通过添加转换为字符串

converters = {h:str for h in names_test}

最终代码可以是:

import pandas as pd
specs_test =[(18, 19),(19, 20),(20, 22),(22,25)] ## Here you where off by an index.
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS", "County_Occurrence_FIPS"]
test_l = pd.read_fwf('test.txt', 
header=None, 
names = names_test, 
colspecs= specs_test, 
converters = {h:str for h in names_test}) ## If you want to keep the leading 
## zeros you can convert to string.

结果:

Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   1   SC  059
1   1   1   SC  051
2   1   1   SC  019
3   1   1   SC  033
4   1   1   SC  007
5   1   1   SC  041
6   2   2   SC  079
7   1   1   SC  043
8   1   1   SC  045
9   2   2   SC  079

我是在将数据粘贴到测试文件并修复元组时得到的。

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test )

它将删除第4列上的前导零,因此您可能不得不使用kwargs来发送数据类型或在导入后修复该列

Record_Type  Resident_Status State_Occurrence_FIPS  County_Occurrence_FIPS
0            1                1                    SC                      59
1            1                1                    SC                      51
2            1                1                    SC                      19
3            1                1                    SC                      33
4            1                1                    SC                       7
5            1                1                    SC                      41
6            2                2                    SC                      79
7            1                1                    SC                      43
8            1                1                    SC                      45
9            2                2                    SC                      79

相关内容

  • 没有找到相关文章

最新更新