我想在多列上给予优先级,然后根据优先级选择行
我想在类别列中选择具有RC优先级的ID,并在状态列中选择优先级,并相应地选择行
示例:输入dataframe
ID Category Status Date
1 GC Pending 01-03-2015
1 RC Resolved 05-10-2016
1 GC Resolved 06-03-2017
2 RC Pending 09-08-2016
2 RC Resolved 10-05-2014
3 GC Resolved 10-08-2018
3 RC Pending 13-05-2019
4 GC Pending 10-06-2018
4 GC Resolved 15-09-2014
输出数据框架
ID Category Status Date
1 RC Resolved 05-10-2016
2 RC Pending 09-08-2016
3 RC Pending 13-05-2019
4 GC Pending 10-06-2018
通过将列表传递给categories
参数,将列转换为具有设置优先级的订购分类,然后通过DataFrame.sort_values
通过3列进行排序,最后删除用DataFrame.drop_duplicates
:
df['Category'] = pd.Categorical(df['Category'], ordered=True, categories=['GC','RC'])
df['Status'] = pd.Categorical(df['Status'], ordered=True, categories=['Resolved','Pending'])
df = df.sort_values(['ID','Category','Status']).drop_duplicates('ID', keep='last')
print (df)
ID Category Status Date
1 1 RC Resolved 05-10-2016
3 2 RC Pending 09-08-2016
6 3 RC Pending 13-05-2019
7 4 GC Pending 10-06-2018