如果我有一个包含符号和不同日期的每日数据的数据框架:
level_0 index date symbol open ... volume_10_day is_downtrending is_downtrending_lookback consolidating_10 consolidating_10_lookback
0 3608 3608 2022-10-26 CIFR 0.8600 ... 3883.2 0 0 0 1
1 11367 11367 2022-09-12 CLVS 1.2800 ... 24749.8 0 0 0 1
2 13031 13031 2022-10-06 CGC 3.0700 ... 3807474.9 0 0 0 1
3 13044 13044 2022-10-25 CGC 2.4000 ... 4213340.1 0 0 0 1
4 13864 13864 2022-09-02 CMCM 4.9100 ... 3560.0 0 0 0 1
.. ... ... ... ... ... ... ... ... ... ... ...
353 684622 684622 2022-10-24 SOBR 3.2500 ... 65830.2 0 0 0 1
354 685045 685045 2022-08-29 SNTG 2.6500 ... 12765.3 0 1 0 1
355 685093 685093 2022-11-04 SNTG 4.6889 ... 17969582.7 0 0 0 0
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
357 688103 688103 2022-10-11 BHG 0.8750 ... 1489.5 0 1 0 1
[358 rows x 18 columns]
有时,相同的日子有好几倍,但符号不同。例如,在2022-10-11上出现了两个符号:WNW, BHG。
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
357 688103 688103 2022-10-11 BHG 0.8750 ... 1489.5 0 1 0 1
当发生这种情况时,我只希望返回第一个实例(同一天出现的所有其他符号都应该被删除),类似于:
level_0 index date symbol open ... volume_10_day is_downtrending is_downtrending_lookback consolidating_10 consolidating_10_lookback
0 3608 3608 2022-10-26 CIFR 0.8600 ... 3883.2 0 0 0 1
1 11367 11367 2022-09-12 CLVS 1.2800 ... 24749.8 0 0 0 1
2 13031 13031 2022-10-06 CGC 3.0700 ... 3807474.9 0 0 0 1
3 13044 13044 2022-10-25 CGC 2.4000 ... 4213340.1 0 0 0 1
4 13864 13864 2022-09-02 CMCM 4.9100 ... 3560.0 0 0 0 1
.. ... ... ... ... ... ... ... ... ... ... ...
353 684622 684622 2022-10-24 SOBR 3.2500 ... 65830.2 0 0 0 1
354 685045 685045 2022-08-29 SNTG 2.6500 ... 12765.3 0 1 0 1
355 685093 685093 2022-11-04 SNTG 4.6889 ... 17969582.7 0 0 0 0
356 686851 686851 2022-10-11 WNW 0.8700 ... 5172.1 0 0 0 1
[357 rows x 18 columns]
在WNW, BHG的副本中,只返回第一个(WNW)。
我该怎么做?比如:
df_filtered.drop_duplicates(subset=['date', 'symbol'], inplace=True)
任何帮助都是非常感谢的
根据评论中的讨论,这个解决方案有效:
df_filtered.drop_duplicates(subset=['date'], keep='first', inplace=True)