如何获得每家航空公司的前5个理由?



我只需要每个航空公司的前5个原因。我设法得到了所有航空公司的交叉表,但它没有排序,它显示了所有的原因。我怎样才能缩小搜索范围?

pd.crosstab(df.airline, df.negativereason).apply(lambda x: x, axis=1)
>negativereason Bad Flight  Can't Tell  Cancelled Flight    Customer Service Issue  Damaged Luggage Flight Attendant Complaints Flight Booking Problems Late Flight Lost Luggage    longlines
airline                                     
>American   87  198 246 768 12  87  130 249 149 34
>Delta  64  186 51  199 11  60  44  269 57  14
>Southwest  90  159 162 391 14  38  61  152 90  29
>US Airways 104 246 189 811 11  123 122 453 154 50
>United 216 379 181 681 22  168 144 525 269 48

想要的结果

>American
>Customer Service Issue 768
>Late Flight 249
>Cancelled Flight 246
>Can't Tell 198
>Lost Luggage 149

这是数据集

>tweet_id   airline_sentiment   airline_sentiment_confidence    negativereason  negativereason_confidence   airline airline_sentiment_gold  name    negativereason_gold retweet_count   text    tweet_coord tweet_created   tweet_location  user_timezone
>0  570306133677760513  neutral 1.0000  NaN NaN Virgin America  NaN cairdin NaN 0   @VirginAmerica What @dhepburn said. NaN 2015-02-24 11:35:52 -0800   NaN Eastern Time (US & Canada)
>1  570301130888122368  positive    0.3486  NaN 0.0000  Virgin America  NaN jnardino    NaN 0   @VirginAmerica plus you've added commercials t...   NaN 2015-02-24 11:15:59 -0800   NaN Pacific Time (US & Canada)
>2  570301083672813571  neutral 0.6837  NaN NaN Virgin America  NaN yvonnalynn  NaN 0   @VirginAmerica I didn't today... Must mean I n...   NaN 2015-02-24 11:15:48 -0800   Lets Play   Central Time (US & Canada)
>3  570301031407624196  negative    1.0000  Bad Flight  0.7033  Virgin America  NaN jnardino    NaN 0   @VirginAmerica it's really aggressive to blast...   NaN 2015-02-24 11:15:36 -0800   NaN Pacific Time (US & Canada)
>4  570300817074462722  negative    1.0000  Can't Tell  1.0000  Virgin America  NaN jnardino    NaN 0   @VirginAmerica and it's a really big bad thing...   NaN 2015-02-24 11:14:45 -0800   NaN Pacific Time (US & Canada)

这不是最好的解决方案,但它确实起到了作用。

top_n = 5
gb = df.groupby(['airline', 'negativereason']).size().reset_index(name='freq')
df_tops = gb.groupby('airline').apply(lambda x: x.nlargest(top_n, ['freq'])).reset_index(drop=True)

需要两个步骤。首先是计算每家航空公司每个负面原因的频率,其次是根据频率取top_n个原因。

我设法得到了每个航班的每个负面原因的计数。但是我仍然不能得到每个航空公司的前5个结果,从高到低排序。

count = df.groupby(['airline','negativereason']).size()
print(count) 
>airline         negativereason             
>American        Bad Flight                      87
>                Can't Tell                     198
>                Cancelled Flight               246
>                Customer Service Issue         768
>                Damaged Luggage                 12
>                Flight Attendant Complaints     87
>                Flight Booking Problems        130
>                Late Flight                    249
>                Lost Luggage                   149
>                longlines                       34
>Delta           Bad Flight                      64
>                Can't Tell                     186
>                Cancelled Flight                51
>                Customer Service Issue         199
>                Damaged Luggage                 11
>                Flight Attendant Complaints     60
>                Flight Booking Problems         44
>                Late Flight                    269
>                Lost Luggage                    57
>                longlines                       14

一种方法:

数据集

,Bad Flight,Cant Tell, Cancelled Flight,Customer Service Issue,Damaged Luggage,Flight Attendant Complaints,Flight Booking Problems,Late Flight,Lost Luggage,Longlines Airline
American,87,198,246,768,12,87,130,249,149,34
Delta,64,186,51,199,11,60,44,269,57,14
Southwest,90,159,162,391,14,38,61,152,90,29
US Airways,104,246,189,811,11,123,122,453,154,50
United,216,379,181,681,22,168,144,525,269,48

import pandas as pd
air = pd.read_csv("airlines.csv", index_col = 0)
print(air)
print(" ")
american5 = air.loc["American"].sort_values(ascending = False).get(range(5))
print(american5)

Bad Flight  Cant Tell   Cancelled Flight  Customer Service Issue  Damaged Luggage  Flight Attendant Complaints  Flight Booking Problems  Late Flight  Lost Luggage  Longlines Airline
American            87         198                246                     768               12                           87                      130          249           149                 34
Delta               64         186                 51                     199               11                           60                       44          269            57                 14
Southwest           90         159                162                     391               14                           38                       61          152            90                 29
US Airways         104         246                189                     811               11                          123                      122          453           154                 50
United             216         379                181                     681               22                          168                      144          525           269                 48
Customer Service Issue    768
Late Flight               249
Canceled Flight           246
Cant Tell                 198
Lost Luggage              149
Name: American, dtype: int64

最新更新