查找1min数据的最小值



我有以下格式的时间序列数据:

| quote_datetime |  Moneyness | underlying_bid | askC | askP | bidC | bidP | fwd_premium_abs|
|-----------------|-----------|-----------------|------|-----|------|------|------------|
| 2020-02-25 14:10:00 | 0.980861244 |3134.6 | 73.8 | 10.8 |66.5 | 10.4 |63.4|
| 2020-02-25 14:10:00 | 0.990861244 |3134.6 | 73.4 | 11.8 |63.5 | 11.4 |65.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 70.4 | 12.8 |61.5 | 14.4 |64.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 72.4 | 10.1 |60.1 | 12.4 |67.4|

对于quote_datetime中的每个唯一值,我需要最小Fwd_premium_abs我试着用循环

for j in df['quote_datetime'].unique():
temp =df[df['quote_datetime']==j]['fwd_premium_abs'].min()

但这是一种非常低效的方法,如果在巨大的数据集上进行,计算成本也很高。做这件事的最佳方式是什么?

这是字典格式的一小部分数据

{'strike': {0: 3075.0, 1: 3075.0, 2: 3075.0, 3: 3075.0, 4: 3075.0},
'Date': {0: datetime.date(2020, 2, 25),
1: datetime.date(2020, 2, 25),
2: datetime.date(2020, 2, 25),
3: datetime.date(2020, 2, 25),
4: datetime.date(2020, 2, 25)},
'quote_datetime': {0: Timestamp('2020-02-25 14:10:00'),
1: Timestamp('2020-02-25 14:12:00'),
2: Timestamp('2020-02-25 14:19:00'),
3: Timestamp('2020-02-25 14:20:00'),
4: Timestamp('2020-02-25 14:22:00')},
'Moneyness': {0: 0.9808612440191388,
1: 0.9808612440191388,
2: 0.9808612440191388,
3: 0.9808612440191388,
4: 0.9808612440191388},
'underlying_bid': {0: 3134.6, 1: 3135.8, 2: 3137.29, 3: 3136.91, 4: 3136.99},
'askC': {0: 73.8, 1: 74.4, 2: 76.7, 3: 74.8, 4: 74.2},
'askP': {0: 10.8, 1: 10.9, 2: 10.5, 3: 10.7, 4: 10.7},
'bidC': {0: 66.5, 1: 69.1, 2: 70.1, 3: 71.7, 4: 71.2},
'bidP': {0: 10.4, 1: 10.3, 2: 9.4, 3: 10.2, 4: 10.2},
'fwd_premium_abs': {0: 63.4, 1: 64.10000000000001, 2: 67.3, 3: 64.6, 4: 64.0}}

如果您希望数据帧保持不变,则可以在分组后使用.transform

df['fwd_premium_abs'] = df.groupby('quote_datetime')['fwd_premium_abs'].transform(min)

具有相同'quote_datetime'的所有行在列'fwd_premium_abs'中将具有相同的值。该值将是该列中以前值的最小值。


编辑:

如果您想选择

df.loc[df.groupby('quote_datetime')['fwd_premium_abs'].idxmin()]

在'quote_datetime'列上使用groupby来查找'fwd_premium_abs'中的最小值。

df.groupby('quote_datetime')['fwd_premium_abs'].min()

输出

quote_datetime
2020-02-25 14:10:00    63.4
2020-02-25 14:12:00    64.1
2020-02-25 14:19:00    67.3
2020-02-25 14:20:00    64.6
2020-02-25 14:22:00    64.0

这就是你想要的吗?

由于您首先提供了所有唯一的日期时间,因此我更改了一个日期时间,用重复的时间戳(现在位于行索引1和2(来演示它的样子

>>> df.iat[2,2] = Timestamp('2020-02-25 14:12:00')
>>> df
strike        Date      quote_datetime  Moneyness  underlying_bid  askC  askP  bidC  bidP  fwd_premium_abs
0  3075.0  2020-02-25 2020-02-25 14:10:00   0.980861         3134.60  73.8  10.8  66.5  10.4             63.4
1  3075.0  2020-02-25 2020-02-25 14:12:00   0.980861         3135.80  74.4  10.9  69.1  10.3             64.1
2  3075.0  2020-02-25 2020-02-25 14:12:00   0.980861         3137.29  76.7  10.5  70.1   9.4             67.3
3  3075.0  2020-02-25 2020-02-25 14:20:00   0.980861         3136.91  74.8  10.7  71.7  10.2             64.6
4  3075.0  2020-02-25 2020-02-25 14:22:00   0.980861         3136.99  74.2  10.7  71.2  10.2             64.0

df.groupby('quote_datetime')['fwd_premium_abs'].min()

输出

quote_datetime
2020-02-25 14:10:00    63.4
2020-02-25 14:12:00    64.1
2020-02-25 14:20:00    64.6
2020-02-25 14:22:00    64.0

最新更新