我有一个带有日期的pandas数据框架。我需要知道其他日期对是否连续。
2 1988-01-01
3 2015-01-31
4 2015-02-01
5 2015-05-31
6 2015-06-01
7 2021-11-16
11 2021-11-17
12 2022-10-05
8 2022-10-06
9 2022-10-12
10 2022-10-13
# How to build this example dataframe
df=pd.DataFrame({'date':pd.to_datetime(['1988-01-01','2015-01-31','2015-02-01', '2015-05-31','2015-06-01', '2021-11-16', '2021-11-17', '2022-10-05', '2022-10-06', '2022-10-12', '2022-10-13'])})
每对应该是连续的。我尝试过不同的排序,但我看到的所有东西都与整个级数是连续的有关。我需要比较第一次约会后的每对日期。
cb_gap = cb_sorted.sort_values('dates').groupby('dates').diff() > pd.to_timedelta('1 day')
我要看的是这个…
2 1988-01-01 <- Ignore the start date
3 2015-01-31 <- these dates have no gap
4 2015-02-01
5 2015-05-31 <- these dates have no gap
6 2015-06-01
7 2021-11-16 <- these have a gap!!!!
11 2021-11-18
12 2022-10-05 <- these have no gap
8 2022-10-06
9 2022-10-12
一种方法是使用移位和计算差异。
pd.DataFrame({'date':df.date,'diff':df.date.shift(-1)-df.date})[1::2]
返回date diff
1 2015-01-31 1 days
3 2015-05-31 1 days
5 2021-11-16 1 days
7 2022-10-05 1 days
9 2022-10-12 1 days
它也更快
<表类> 方法时间 tbody><<tr>Naveed的 4.23毫秒 这个 0.93 ms 表类>
这是一种方法
btw,你期望的输出是什么? 答案是b/w,连续日期跳过第一行并填充diff列
# make date into datetime
df['date'] = pd.to_datetime(df['date'])
# create two intermediate DF skipping the first and taking alternate values
# and concat them along x-axis
df2=pd.concat([df.iloc[1:].iloc[::2].reset_index()[['id','date']],
df.iloc[2:].iloc[::2].reset_index()[['id','date']]
],axis=1 )
# take the difference of second date from the first one
df2['diff']=df2.iloc[:,3]-df2.iloc[:,1]
df2
id date id date diff
0 3 2015-01-31 4 2015-02-01 1 days
1 5 2015-05-31 6 2015-06-01 1 days
2 7 2021-11-16 11 2021-11-17 1 days
3 12 2022-10-05 8 2022-10-06 1 days
4 9 2022-10-12 10 2022-10-13 1 days