我的数据是这样的:
GIdx,Date,num,Time
1,11/28/2012,20,10:05:50
1,11/28/2012,20,10:05:50
2,11/28/2012,20,10:09:24
2,11/28/2012,20,10:09:24
2,11/28/2012,20,10:09:25
2,11/28/2012,20,10:09:25
2,11/28/2012,20,10:09:26
3,11/28/2012,20,10:09:34
3,11/28/2012,20,10:09:34
我尝试读取列日期作为datetime
和列时间作为time
但是当我检查他们的类型时,我得到Series
:
type(df['Date'])
class pandas.core.series.Series
type(df_original['Time'])
class pandas.core.series.Series
我这样做了:
df=pd.read_csv(filename,sep=",", header = 0, na_values=['NA'])
您可以在read_csv
中添加parse_dates
参数,其中dates
和times
为列:
import pandas as pd
import io
temp=u"""GIdx,Date,num,Time
1,11/28/2012,20,10:05:50
1,11/28/2012,20,10:05:50
2,11/28/2012,20,10:09:24
2,11/28/2012,20,10:09:24
2,11/28/2012,20,10:09:25
2,11/28/2012,20,10:09:25
2,11/28/2012,20,10:09:26
3,11/28/2012,20,10:09:34
3,11/28/2012,20,10:09:34"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), parse_dates=[['Date','Time']])
print (df)
Date_Time GIdx num
0 2012-11-28 10:05:50 1 20
1 2012-11-28 10:05:50 1 20
2 2012-11-28 10:09:24 2 20
3 2012-11-28 10:09:24 2 20
4 2012-11-28 10:09:25 2 20
5 2012-11-28 10:09:25 2 20
6 2012-11-28 10:09:26 2 20
7 2012-11-28 10:09:34 3 20
8 2012-11-28 10:09:34 3 20
print (df.dtypes)
Date_Time datetime64[ns]
GIdx int64
num int64
dtype: object
您可以省略参数sep=","
, header = 0
和na_values=['NA']
,因为默认情况下有:
df=pd.read_csv(filename,sep=",", header = 0, na_values=['NA'])
df=pd.read_csv(filename)