我有以下格式的时间序列数据:
| quote_datetime | Moneyness | underlying_bid | askC | askP | bidC | bidP | fwd_premium_abs|
|-----------------|-----------|-----------------|------|-----|------|------|------------|
| 2020-02-25 14:10:00 | 0.980861244 |3134.6 | 73.8 | 10.8 |66.5 | 10.4 |63.4|
| 2020-02-25 14:10:00 | 0.990861244 |3134.6 | 73.4 | 11.8 |63.5 | 11.4 |65.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 70.4 | 12.8 |61.5 | 14.4 |64.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 72.4 | 10.1 |60.1 | 12.4 |67.4|
对于quote_datetime中的每个唯一值,我需要最小Fwd_premium_abs我试着用循环
for j in df['quote_datetime'].unique():
temp =df[df['quote_datetime']==j]['fwd_premium_abs'].min()
但这是一种非常低效的方法,如果在巨大的数据集上进行,计算成本也很高。做这件事的最佳方式是什么?
这是字典格式的一小部分数据
{'strike': {0: 3075.0, 1: 3075.0, 2: 3075.0, 3: 3075.0, 4: 3075.0},
'Date': {0: datetime.date(2020, 2, 25),
1: datetime.date(2020, 2, 25),
2: datetime.date(2020, 2, 25),
3: datetime.date(2020, 2, 25),
4: datetime.date(2020, 2, 25)},
'quote_datetime': {0: Timestamp('2020-02-25 14:10:00'),
1: Timestamp('2020-02-25 14:12:00'),
2: Timestamp('2020-02-25 14:19:00'),
3: Timestamp('2020-02-25 14:20:00'),
4: Timestamp('2020-02-25 14:22:00')},
'Moneyness': {0: 0.9808612440191388,
1: 0.9808612440191388,
2: 0.9808612440191388,
3: 0.9808612440191388,
4: 0.9808612440191388},
'underlying_bid': {0: 3134.6, 1: 3135.8, 2: 3137.29, 3: 3136.91, 4: 3136.99},
'askC': {0: 73.8, 1: 74.4, 2: 76.7, 3: 74.8, 4: 74.2},
'askP': {0: 10.8, 1: 10.9, 2: 10.5, 3: 10.7, 4: 10.7},
'bidC': {0: 66.5, 1: 69.1, 2: 70.1, 3: 71.7, 4: 71.2},
'bidP': {0: 10.4, 1: 10.3, 2: 9.4, 3: 10.2, 4: 10.2},
'fwd_premium_abs': {0: 63.4, 1: 64.10000000000001, 2: 67.3, 3: 64.6, 4: 64.0}}
如果您希望数据帧保持不变,则可以在分组后使用.transform
:
df['fwd_premium_abs'] = df.groupby('quote_datetime')['fwd_premium_abs'].transform(min)
具有相同'quote_datetime'
的所有行在列'fwd_premium_abs'
中将具有相同的值。该值将是该列中以前值的最小值。
编辑:
如果您想选择行:
df.loc[df.groupby('quote_datetime')['fwd_premium_abs'].idxmin()]
在'quote_datetime'列上使用groupby
来查找'fwd_premium_abs'中的最小值。
df.groupby('quote_datetime')['fwd_premium_abs'].min()
输出
quote_datetime
2020-02-25 14:10:00 63.4
2020-02-25 14:12:00 64.1
2020-02-25 14:19:00 67.3
2020-02-25 14:20:00 64.6
2020-02-25 14:22:00 64.0
这就是你想要的吗?
由于您首先提供了所有唯一的日期时间,因此我更改了一个日期时间,用重复的时间戳(现在位于行索引1和2(来演示它的样子
>>> df.iat[2,2] = Timestamp('2020-02-25 14:12:00')
>>> df
strike Date quote_datetime Moneyness underlying_bid askC askP bidC bidP fwd_premium_abs
0 3075.0 2020-02-25 2020-02-25 14:10:00 0.980861 3134.60 73.8 10.8 66.5 10.4 63.4
1 3075.0 2020-02-25 2020-02-25 14:12:00 0.980861 3135.80 74.4 10.9 69.1 10.3 64.1
2 3075.0 2020-02-25 2020-02-25 14:12:00 0.980861 3137.29 76.7 10.5 70.1 9.4 67.3
3 3075.0 2020-02-25 2020-02-25 14:20:00 0.980861 3136.91 74.8 10.7 71.7 10.2 64.6
4 3075.0 2020-02-25 2020-02-25 14:22:00 0.980861 3136.99 74.2 10.7 71.2 10.2 64.0
df.groupby('quote_datetime')['fwd_premium_abs'].min()
输出
quote_datetime
2020-02-25 14:10:00 63.4
2020-02-25 14:12:00 64.1
2020-02-25 14:20:00 64.6
2020-02-25 14:22:00 64.0