我正在使用pd.read_fwf
读取一个文本文件,如下所示:
import pandas as pd
specs_test =[(19, 20),(20, 21),(21, 23),(23,26)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
test_l = pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test)
test.txt如下:
11SC059
11SC051
11SC019
11SC033
11SC007
11SC041
22SC079
11SC043
11SC045
22SC079
读取文件test_l后如下所示:
Record_Type Resident_Status State_Occurrence_FIPS County_Occurrence_FIPS
0 1 S C0 59
1 1 S C0 51
2 1 S C0 19
3 1 S C0 33
4 1 S C0 7
5 1 S C0 41
6 2 S C0 79
7 1 S C0 43
8 1 S C0 45
9 2 S C0 79
但是,根据我的colspec,它应该有以下内容(正如我所期望的,我刚刚添加了第一行(:
1 1 SC 059
我在这里错过了什么?非常感谢你的帮助!
首先,您将被一个索引关闭。尝试:
specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]
此外,对于数值,前导零将被忽略。为了保存它们,您可以通过添加转换为字符串
converters = {h:str for h in names_test}
最终代码可以是:
import pandas as pd
specs_test =[(18, 19),(19, 20),(20, 22),(22,25)] ## Here you where off by an index.
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS", "County_Occurrence_FIPS"]
test_l = pd.read_fwf('test.txt',
header=None,
names = names_test,
colspecs= specs_test,
converters = {h:str for h in names_test}) ## If you want to keep the leading
## zeros you can convert to string.
结果:
Record_Type Resident_Status State_Occurrence_FIPS County_Occurrence_FIPS
0 1 1 SC 059
1 1 1 SC 051
2 1 1 SC 019
3 1 1 SC 033
4 1 1 SC 007
5 1 1 SC 041
6 2 2 SC 079
7 1 1 SC 043
8 1 1 SC 045
9 2 2 SC 079
我是在将数据粘贴到测试文件并修复元组时得到的。
specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test )
它将删除第4列上的前导零,因此您可能不得不使用kwargs来发送数据类型或在导入后修复该列
Record_Type Resident_Status State_Occurrence_FIPS County_Occurrence_FIPS
0 1 1 SC 59
1 1 1 SC 51
2 1 1 SC 19
3 1 1 SC 33
4 1 1 SC 7
5 1 1 SC 41
6 2 2 SC 79
7 1 1 SC 43
8 1 1 SC 45
9 2 2 SC 79