我有df:
ID,"address","used_at","active_seconds","pageviews"
71ecd2aa165114e5ee292131f1167d8c,"auto.drom.ru",2014-05-17 10:58:59,166,2
71ecd2aa165114e5ee292131f1167d8c,"auto.drom.ru",2016-07-17 17:34:07,92,4
70150aba267f671045f147767251d169,"avito.ru/*/avtomobili",2014-06-15 11:52:09,837,40
bc779f542049bcabb9e68518a215814e,"auto.yandex.ru",2014-01-16 22:23:56,8,1
bc779f542049bcabb9e68518a215814e,"avito.ru/*/avtomobili",2014-01-18 14:38:33,313,5
bc779f542049bcabb9e68518a215814e,"avito.ru/*/avtomobili",2016-07-18 18:12:07,20,1
我需要删除used_at
大于2016-06-30
的所有字符串。我该怎么做呢?
使用dt.date
和boolean indexing
:
print (df.used_at.dt.date > pd.to_datetime('2016-06-30').date())
0 False
1 True
2 False
3 False
4 False
5 True
Name: used_at, dtype: bool
print (df[df.used_at.dt.date > pd.to_datetime('2016-06-30').date()])
ID address
1 71ecd2aa165114e5ee292131f1167d8c auto.drom.ru
5 bc779f542049bcabb9e68518a215814e avito.ru/*/avtomobili
used_at active_seconds pageviews
1 2016-07-17 17:34:07 92 4
5 2016-07-18 18:12:07 20 1
或者您可以通过year
, month
和day
定义日期时间:
print (df[df.used_at.dt.date > pd.datetime(2016, 6, 30).date()])
ID address
1 71ecd2aa165114e5ee292131f1167d8c auto.drom.ru
5 bc779f542049bcabb9e68518a215814e avito.ru/*/avtomobili
used_at active_seconds pageviews
1 2016-07-17 17:34:07 92 4
5 2016-07-18 18:12:07 20 1