从panda时间序列数据帧中删除重复的行

我正试图从pandas数据帧中删除重复的时间序列数据：

import numpy as np
import pandas as pd
# original data
df = pd.DataFrame()
np.random.seed(0)
days = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-04', '2015-01-05', '2015-01-06', '2015-01-06', '2015-01-07', '2015-01-08'])
df = pd.DataFrame({'Date': days, 'col1': np.random.randn(len(days))})
df = df.set_index('Date')
#df = df.drop_duplicates(subset='df.index')
print(df)
# remove duplicates, keep first instance
n = np.where(df.index.duplicated())[0]
print(n)
df0 = df.drop(df.iloc[n.tolist()])
print(df0)

drop_duplicates命令不起作用，所以我尝试使用iloc，这会导致以下错误：

KeyError: "['col1'] not found in axis"

尝试：

print(df[~df.index.duplicated()])

打印：

col1
Date                
2015-01-01  1.764052
2015-01-02  0.400157
2015-01-03  0.978738
2015-01-04  2.240893
2015-01-05 -0.977278
2015-01-06  0.950088
2015-01-07 -0.103219
2015-01-08  0.410599

您可以使用：

df.reset_index().drop_duplicates(subset='Date').set_index('Date')

输出：

col1
Date    
2015-01-01  1.764052
2015-01-02  0.400157
2015-01-03  0.978738
2015-01-04  2.240893
2015-01-05  -0.977278
2015-01-06  0.950088
2015-01-07  -0.103219
2015-01-08  0.410599

相关内容

最新更新

热门标签：