从panda时间序列数据帧中删除重复的行



我正试图从pandas数据帧中删除重复的时间序列数据:

import numpy as np
import pandas as pd
# original data
df = pd.DataFrame()
np.random.seed(0)
days = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-04', '2015-01-05', '2015-01-06', '2015-01-06', '2015-01-07', '2015-01-08'])
df = pd.DataFrame({'Date': days, 'col1': np.random.randn(len(days))})
df = df.set_index('Date')
#df = df.drop_duplicates(subset='df.index')
print(df)
# remove duplicates, keep first instance
n = np.where(df.index.duplicated())[0]
print(n)
df0 = df.drop(df.iloc[n.tolist()])
print(df0)

drop_duplicates命令不起作用,所以我尝试使用iloc,这会导致以下错误:

KeyError: "['col1'] not found in axis"

尝试:

print(df[~df.index.duplicated()])

打印:

col1
Date                
2015-01-01  1.764052
2015-01-02  0.400157
2015-01-03  0.978738
2015-01-04  2.240893
2015-01-05 -0.977278
2015-01-06  0.950088
2015-01-07 -0.103219
2015-01-08  0.410599

您可以使用:

df.reset_index().drop_duplicates(subset='Date').set_index('Date')

输出:

col1
Date    
2015-01-01  1.764052
2015-01-02  0.400157
2015-01-03  0.978738
2015-01-04  2.240893
2015-01-05  -0.977278
2015-01-06  0.950088
2015-01-07  -0.103219
2015-01-08  0.410599

最新更新