所以我有时间序列数据,我做了一些计算。我被困在试图找到一种方法来获得序列中nan点之间的日期值。
例如,该系列如下所示:
start_date counts
3 2021-10-14 20:12:13 0
4 2021-10-14 20:21:10 1
5 2021-10-14 20:22:15 2
6 2021-10-14 20:23:14 3
7 2021-10-14 20:23:51 4
8 2021-10-14 20:39:11 0
9 2021-10-14 20:41:21 1
10 2021-10-14 20:41:45 2
11 2021-10-14 20:42:10 3
12 2021-10-14 20:46:10 4
13 2021-10-14 20:52:53 5
14 2021-10-14 20:53:10 6
15 2021-10-14 20:56:10 7
16 2021-10-14 20:57:46 8
17 2021-10-14 20:59:25 9
18 2021-10-14 21:00:12 10
19 2021-10-14 21:02:24 11
20 2021-10-14 21:06:13 12
21 2021-10-14 21:09:12 13
22 2021-10-14 21:11:35 14
23 2021-10-14 21:16:30 15
24 2021-10-14 21:19:12 16
25 2021-10-14 21:32:14 0
29 2021-10-14 23:52:07 0
30 2021-10-14 23:57:41 1
31 2021-10-15 00:06:14 2
32 2021-10-15 00:23:25 0
33 2021-10-15 00:32:09 1
34 2021-10-15 00:54:11 0
35 2021-10-15 01:03:13 1
我想在最后一个元素(在本例中是16,但可以是大于1的任何数字)的日期旁边获得第一个元素(总是= 1)的日期
所以期望的输出应该是:
2021-10-14 20:41:21 : 2021-10-14 21:19:12
.
.
etc.
iuc
# Extract a subset of your dataframe with a clean index
df1 = df.reset_index()[['start_date', 'counts']]
# Detect 2 consecutive 0 (or NaN?) and get previous row
idx2 = df1.loc[df1['counts'].eq(0)
& df1['counts'].shift(-1).eq(0), 'counts'].index - 1
# Find the counts of the row then subtract to idx2
idx1 = idx2 - df1.loc[idx2, 'counts'].values + 1
# Join the 2 indexes
out = pd.concat([df1.loc[idx1, 'start_date'].reset_index(drop=True),
df1.loc[idx2, 'start_date'].reset_index(drop=True)], axis=1)
输出:
>>> out
start_date start_date
0 2021-10-14 20:41:21 2021-10-14 21:19:12