计算如果我同时有一个最大值打开,哪些票将打开



我有许多openclosedatetime64列的门票数据框。下面是一个示例:

df = pd.DataFrame({
"open": ["03/27/14", "03/28/14", "03/31/14", "04/01/14", "04/02/14", "04/03/14", "04/04/14", "04/07/14", "04/08/14", "04/09/14", "05/10/14", "05/11/14", "05/14/14", "05/15/14", "05/16/14", "05/17/14", "05/21/14", "05/22/14", "05/25/14", "05/26/14"],
"close": ["04/01/14", "04/02/14", "04/03/14", "04/04/14", "04/07/14", "04/08/14", "04/09/14", "04/10/14", "04/11/14", "04/14/14", "05/15/14", "05/16/14", "05/17/14", "05/24/14", "05/24/14", "05/27/14", "05/27/14", "05/27/14", "05/28/14", "05/29/14"]
}).astype({
"open": "datetime64[ns]",
"close": "datetime64[ns]",
})

我需要知道哪些票是开放的,如果我可以有最多的n开放票在同一时间。

注意:门票只在下午开放,只在上午关闭,所以如果一张票与另一张票有相同的关闭日期,那么这两张票是连续的,而不是重叠的。

例如,使用n=1n=2n=3应该会得到以下结果:

en_date     ex_date     n=1     n=2     n=3
0   03/27/14    04/01/14    True    True    True
1   03/28/14    04/02/14    False   True    True
2   03/31/14    04/03/14    False   False   True
3   04/01/14    04/04/14    True    True    True
4   04/02/14    04/07/14    False   True    True
5   04/03/14    04/08/14    False   False   True
6   04/04/14    04/09/14    True    True    True
7   04/07/14    04/10/14    False   True    True
8   04/08/14    04/11/14    False   False   True
9   04/09/14    04/14/14    True    True    True
10  05/10/14    05/15/14    True    True    True
11  05/11/14    05/16/14    False   True    True
12  05/14/14    05/17/14    False   False   True
13  05/15/14    05/24/14    True    True    True
14  05/16/14    05/24/14    False   True    True
15  05/17/14    05/27/14    False   False   True
16  05/21/14    05/27/14    False   False   False
17  05/22/14    05/27/14    False   False   False
18  05/25/14    05/28/14    True    True    True
19  05/26/14    05/29/14    False   True    True

我可以做一个for循环,但dfs有数百万行,这将花费很长时间。
哪个是最快的计算方法?

n = 2
# Number of previous tickets overlapping
df['over'] = np.sum(np.tril(np.subtract.outer(df['open'].astype(str).str.replace('-', '').astype(int).values,df['close'].astype(str).str.replace('-', '')
def reduce_it(acc, row):
i = acc[1]
prev_open = sum([acc[0][i-p] for p in range(1,row+1)])

acc[0][i] = 1 if prev_open < n else 0
return (acc[0], i+1)
df["n=2"], _ = reduce(reduce_it, df["over"].values, (np.empty(df.shape[0], dtype=np.bool), 0))

注意:目前这是我找到的最快的方法。如果有人找到一个更快的代码,很高兴改变接受的答案。

最新更新