将字符串'yyyy-mm-dd hh:mm:ss'日期转换为整数(熊猫,python)



我需要将两个字符串之间的差异与格式yyyy-mm-dd hh:mm:ss (表示datetime)转换为integer。由于我想在 DataFrame 对象(使用 pandas 构建)的所有索引上执行此操作,因此我需要一个内置函数来执行类似

data['difference'] = somefunc(data['date1'],data['date2'])

这样的功能存在吗?如果我构建自己的函数,如何将其应用于数据帧列?

提前感谢!

检查此链接:http://docs.python.org/2/library/time.html?highlight=strptime基本上,您可以将字符串解析为struct_time变量,然后通过属性(tm_hour,tm_min...)访问值。

检查 time.strptime 的示例。

需要 numpy>= 1.7。这是针对熊猫 0.13(即将发布)。在此处查看文档

In [3]: df = DataFrame(dict(A = Timestamp('20130101'), B = Timestamp('20130101')+ pd.to_timedelta(list(range(5)),unit='D')))
In [4]: df
Out[4]: 
                    A                   B
0 2013-01-01 00:00:00 2013-01-01 00:00:00
1 2013-01-01 00:00:00 2013-01-02 00:00:00
2 2013-01-01 00:00:00 2013-01-03 00:00:00
3 2013-01-01 00:00:00 2013-01-04 00:00:00
4 2013-01-01 00:00:00 2013-01-05 00:00:00
[5 rows x 2 columns]
In [5]: df.dtypes
Out[5]: 
A    datetime64[ns]
B    datetime64[ns]
dtype: object
In [6]: df['C'] = df['B']-df['A']
In [7]: df
Out[7]: 
                    A                   B                C
0 2013-01-01 00:00:00 2013-01-01 00:00:00         00:00:00
1 2013-01-01 00:00:00 2013-01-02 00:00:00 1 days, 00:00:00
2 2013-01-01 00:00:00 2013-01-03 00:00:00 2 days, 00:00:00
3 2013-01-01 00:00:00 2013-01-04 00:00:00 3 days, 00:00:00
4 2013-01-01 00:00:00 2013-01-05 00:00:00 4 days, 00:00:00
[5 rows x 3 columns]
In [8]: df.dtypes
Out[8]: 
A     datetime64[ns]
B     datetime64[ns]
C    timedelta64[ns]
dtype: object
In [9]: df['C'].astype('timedelta64[s]')
Out[9]: 
0         0
1     86400
2    172800
3    259200
4    345600
Name: C, dtype: float64

在 0.12 中,您可以执行此操作

In [1]: df = DataFrame(dict(A = Timestamp('20130101'), B = [Timestamp('20130101')+timedelta(days=i) for i in range(5) ]))
In [2]: df['C'] = df['B']-df['A']
In [3]: Series(df['C'].values / np.timedelta64(1,'s'))
Out[3]: 
0         0
1     86400
2    172800
3    259200
4    345600
dtype: float64

最新更新