文件a.dat:中的数据如下所示
01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g
我希望将它们解析为三列:时间线、浮点数、字符串(None或g)
我试过:
df=pd.read_csv('a.dat',sep=' | ',engine='python')
最终有4列:日期、时间、浮动和g
df=pd.read_csv('a.dat',sep=' | (g)',engine='python')
其得到5个柱,其中柱1和4为NaN-
有没有更好的方法可以在不进行任何后期处理的情况下创建数据帧?
您可以使用read_csv
:
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
sep='s+',
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
或者:
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
delim_whitespace=True,
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
read_fwf
:解决方案
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_fwf(io.StringIO(temp),
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
您还可以指定列的宽度:
df = pd.read_fwf(io.StringIO(temp),
fwidths = [20,12,2],
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g