很难将DBF文件转换为Pandas DataFrame

我正在尝试使用在此处公开的加拿大广播电台DBF文件：https://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00015.html

我想专门将fmstatico.dbf文件读入Pandas DataFrame中。我在Python中尝试了两个常用的推荐DBF包。

使用simpledbf时(https://pypi.org/project/simpledbf/)，我只在使用dbf.to_dataframe((函数时获得列名。

我还在pypi上尝试了dbf(https://pypi.org/project/dbf/)。我能够将DBF文件读取到一个表中：

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf')
table.open(dbf.READ_ONLY)
print(table)
table.close()

并在表格上获得以下信息：

Table:         /datadrive/canada/fmstatio.dbf
Type:          dBase III Plus
Codepage:      ascii (plain ol' ascii)
Status:        DbfStatus.READ_ONLY
Last updated:  1921-12-07
Record count:  8428
Field count:   37
Record length: 221

但当我试图转换成DataFrame时，我没有成功：

oh_canada = pd.DataFrame(table)
table.close()

我收到的错误：

data = fielddef[CLASS](decoder(data)[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)

有人可能知道在Pandas中使用这种DBF文件的最佳方式吗？非常感谢。

表格显示它是"普通老腹水"；，但这是谎言。它包含"；e带有尖锐重音"；，考虑到加拿大数据库中的法语内容，这并不奇怪。要解决此问题，您需要覆盖代码页：

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf',codepage=3)

"3〃；表示默认的Windows代码页CP1252。有了这个，我就可以阅读文件了。

我仍然不确定pandas能否导入它作为迭代器提供的格式。您可能需要使用export将其转换为列表。

相关内容

最新更新

热门标签：