Python -判断pandas数据框中是否有一个非连续的日期



我有一个带有日期的pandas数据框架。我需要知道其他日期对是否连续。

2    1988-01-01
3    2015-01-31
4    2015-02-01
5    2015-05-31
6    2015-06-01
7    2021-11-16
11   2021-11-17
12   2022-10-05
8    2022-10-06
9    2022-10-12
10   2022-10-13
# How to build this example dataframe
df=pd.DataFrame({'date':pd.to_datetime(['1988-01-01','2015-01-31','2015-02-01', '2015-05-31','2015-06-01', '2021-11-16', '2021-11-17', '2022-10-05', '2022-10-06', '2022-10-12', '2022-10-13'])})

每对应该是连续的。我尝试过不同的排序,但我看到的所有东西都与整个级数是连续的有关。我需要比较第一次约会后的每对日期。

cb_gap = cb_sorted.sort_values('dates').groupby('dates').diff() > pd.to_timedelta('1 day')

我要看的是这个…

2    1988-01-01 <- Ignore the start date
3    2015-01-31 <- these dates have no gap
4    2015-02-01
5    2015-05-31 <- these dates have no gap
6    2015-06-01
7    2021-11-16 <- these have a gap!!!!
11   2021-11-18
12   2022-10-05 <- these have no gap
8    2022-10-06
9    2022-10-12

一种方法是使用移位和计算差异。

pd.DataFrame({'date':df.date,'diff':df.date.shift(-1)-df.date})[1::2]

返回
date   diff
1 2015-01-31 1 days
3 2015-05-31 1 days
5 2021-11-16 1 days
7 2022-10-05 1 days
9 2022-10-12 1 days

它也更快

<表类>方法时间tbody><<tr>Naveed的4.23毫秒这个0.93 ms

这是一种方法

btw,你期望的输出是什么? 答案是b/w,连续日期跳过第一行并填充diff列

# make date into datetime
df['date'] = pd.to_datetime(df['date'])
# create two intermediate DF skipping the first and taking alternate values
# and concat them along x-axis
df2=pd.concat([df.iloc[1:].iloc[::2].reset_index()[['id','date']],
df.iloc[2:].iloc[::2].reset_index()[['id','date']]
],axis=1  )
# take the difference of second date from the first one
df2['diff']=df2.iloc[:,3]-df2.iloc[:,1]
df2
id       date   id       date   diff
0   3   2015-01-31  4   2015-02-01  1 days
1   5   2015-05-31  6   2015-06-01  1 days
2   7   2021-11-16  11  2021-11-17  1 days
3   12  2022-10-05  8   2022-10-06  1 days
4   9   2022-10-12  10  2022-10-13  1 days

相关内容

  • 没有找到相关文章

最新更新