获取第一个实例多索引并保留索引值



我有一个多索引的df。IDX1和IDX2都有重复的值(分别为'C'和'Y'),我想保留两者的第一个实例并保留索引值;df_desired是我想要的结果。

我已经尝试过df_tst = df.groupby(level=[0,2]).first(),但这会从df中删除IDX2,因此我无法从IDX1中删除副本。

# starting data
import pandas as pd
data = 
[['31/01/2021','A','X',10,15],
['28/02/2021','A','X',20,30],
['31/03/2021','A','X',30,45],
['31/01/2021','B','Y',20,15],
['28/02/2021','B','Y',20,15],
['31/03/2021','B','Y',30,30],
['31/01/2021','C','Z',40,45],
['28/02/2021','C','Z',50,55],
['31/03/2021','C','Z',60,65],
['31/01/2021','C','Q',40,45],
['28/02/2021','C','Q',50,55],
['31/03/2021','C','Q',60,65],
['31/01/2021','D','Y',20,15],
['28/02/2021','D','Y',20,15],
['31/03/2021','D','Y',30,30]]
df=pd.DataFrame(data)
df.columns = ['DATE','IDX1','IDX2','VAR1','VAR2']
df['DATE'] = pd.to_datetime(df['DATE'])
df.set_index(['IDX1','IDX2','DATE'], inplace=True)
# target df:
df_desired = 
[['31/01/2021','A','X',10,15],
['28/02/2021','A','X',20,30],
['31/03/2021','A','X',30,45],
['31/01/2021','B','Y',20,15],
['28/02/2021','B','Y',20,15],
['31/03/2021','B','Y',30,30],
['31/01/2021','C','Z',40,45],
['28/02/2021','C','Z',50,55],
['31/03/2021','C','Z',60,65]]

使用说明:

#get first indices per first and second level with remove last level of MultiIndex
df1 = df.droplevel(-1)
idx1 = df1.groupby(level=0).head(1).index
idx2 = df1.groupby(level=1).head(1).index
#compare with intersecton of both levels
df = df[df1.index.isin(idx1.intersection(idx2))]
print (df)
VAR1  VAR2
IDX1 IDX2 DATE                  
A    X    2021-01-31    10    15
2021-02-28    20    30
2021-03-31    30    45
B    Y    2021-01-31    20    15
2021-02-28    20    15
2021-03-31    30    30
C    Z    2021-01-31    40    45
2021-02-28    50    55
2021-03-31    60    65

或:

#test duplicated values by first and second level with helper df1 DataFrame
df1 = df.index.to_frame()
df2 = df.droplevel(-1)
idx = df2.index[~df1.duplicated(['IDX1']) & ~df1.duplicated(['IDX2'])]
df = df[df2.index.isin(idx)]
print (df)
VAR1  VAR2
IDX1 IDX2 DATE                  
A    X    2021-01-31    10    15
2021-02-28    20    30
2021-03-31    30    45
B    Y    2021-01-31    20    15
2021-02-28    20    15
2021-03-31    30    30
C    Z    2021-01-31    40    45
2021-02-28    50    55
2021-03-31    60    65

最新更新