我有一个数据帧,我试图在其中筛选日期列中的值在StartDate
和FinishDate
之间的值。为了实现这一点,我从这些日期的字符串值中创建了具有pandas.to_datetime
的列,然后在此基础上进行筛选。
result['date'] = pd.to_datetime(result.DateCreated)
result['StartDate'] = pd.to_datetime(result.StartDate)
result['FinishDate'] = pd.to_datetime(result.FinishDate)
result = result[(result.date >= result.StartDate) &
(result.date <= result.FinishDate)]
所用数据的部分如下,左边的StartDate
和FinishDates
是上面代码之后的值,右边的是to_datetime
出现问题时包含的初始值I
,date,StartDate,FinishDate,startboundry,finishboundry,DateCreated,StartDate,FinishDate
0,2009-06-08,2009-05-01,2009-06-30,False,True,2009-06-08 00:00:00,2009-05-01,2009-06-30
1,2009-10-08,2009-08-01,2009-12-31,False,True,2009-10-08 00:00:00,2009-08-01,2009-12-31
2,2010-01-28,2010-01-01,2010-04-30,False,True,2010-01-28 00:00:00,2010-01-01,2010-04-30
3,2010-05-27,2010-05-01,2010-06-30,False,True,2010-05-27 00:00:00,2010-05-01,2010-06-30
4,2010-09-22,2010-08-01,2010-12-31,False,True,2010-09-22 00:00:00,2010-08-01,2010-12-31
5,2011-01-13,2011-01-01,2011-04-30,False,True,2011-01-13 00:00:00,2011-01-01,2011-04-30
6,2011-05-26,2011-05-01,2011-06-30,False,True,2011-05-26 00:00:00,2011-05-01,2011-06-30
7,2009-01-20,2009-01-01,2009-04-30,False,True,2009-01-20 00:00:00,2009-01-01,2009-04-30
8,2009-05-11,2009-05-01,2009-06-30,False,True,2009-05-11 00:00:00,2009-05-01,2009-06-30
9,2009-10-05,2009-08-01,2009-12-31,False,True,2009-10-05 00:00:00,2009-08-01,2009-12-31
其中一些将(result.date >= result.StartDate
)的初始条件读作False
,尽管它们显然是真的。
例如,如果只是进行字符串比较,那么2009-06-08在时间和词汇上都在2009-05-01之后。
已编辑以添加一些版本控制信息:在确保python pandas等版本相同的过程中,收集了版本信息以供分享,以防在这里有所帮助:
pandas版本0.16.2python 2.7.9版本ipython 3.2.0
过滤您可以在之间使用的数据帧
df[df.date1.between(date2,date3)]