1000
5007501500
我有一个数据框架,它根据数量和交易对代码进行了排序。
使用df_companies
dataframe对相同的行进行分组,并只保留每组的最佳排名。
输入数据:
>>> df
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 2.0 2.0 2.0 4.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
3 ApplE 100 1500 4.0 1.0 4.0 1.0
>>> df_companies
Codes Code Shares
0 ApplA Apple A
1 Amazon Amazon Empty
2 Facebook Facebook Empty
3 ApplE Apple E
4 AmazA Amazon A
5 AmazonB Amazon B
out = df.sort_values('Final Sorted Number')
.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
.drop_duplicates('Code2')
.drop(columns=['Code2', 'Codes'])
输出结果:
>>> out
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplE 100 1500 4.0 1.0 4.0 1.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
Step by Step:
# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
Code Final Sorted Number
0 ApplA 4.0
1 Amazon 2.0
2 Facebook 3.0
3 ApplE 1.0
# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
Code Final Sorted Number
3 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
0 ApplA 4.0
# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
3 ApplA 4.0 Apple ApplA
现在,我们有来自df_companies
的2个新列:Code(通过后缀重命名为Code2)和Codes。让我们继续:
# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
Code Final Sorted Number
0 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0