我已经用Python创建了一个DataFrame,并希望每个邮政编码显示最受欢迎的犬种。我已经对以下内容进行了编码,但我只能显示每个品种的总数,而不能显示品种本身。
我的代码:
import pandas as pd
df = pd.DataFrame({'zip_code':[12345,66666,12345,22222,22222,12345,66666,22222,44444],
'primary_breed': ['labrador','pug','poodle','labrador','labrador','pug','whippet','poodle','labrador'],
'animals_name':['lucy','charley','scout','hank','sweetie','lucy','daddy','lucy','charley'],
'species':['dog','dog','dog','dog','dog','dog','dog','dog','dog']})
# assign correct data types
df['species'] = df['species'].astype('category')
df['animals_name'] = df['animals_name'].astype('string')
df['primary_breed'] = df['primary_breed'].astype('category')
df['zip_code'] = df['zip_code'].astype('string')
dogs = df.species == 'dog'
# total number per breed per zip
df_total_per_breed_zip = df[dogs].groupby('zip_code')['primary_breed'].value_counts()
print('nntotal number per breed: n', df_total_per_breed_zip)
# most popular breed per zip
df_mostpop_breed_zip = df_total_per_breed_zip.max(level='zip_code')
print('nnmost popular breed per zip: n', df_mostpop_breed_zip)
所以我得到的是:
total number per breed:
zip_code primary_breed
12345 labrador 1
poodle 1
pug 1
22222 labrador 2
poodle 1
44444 labrador 1
66666 pug 1
whippet 1
Name: primary_breed, dtype: int64
most popular breed per zip:
zip_code
12345 1
22222 2
44444 1
66666 1
Name: primary_breed, dtype: int64
但我想得到的是:
total number per breed:
zip_code primary_breed
12345 labrador 1
poodle 1
pug 1
22222 labrador 2
poodle 1
44444 labrador 1
66666 pug 1
whippet 1
Name: primary_breed, dtype: int64
most popular breed per zip:
zip_code
12345 labrador
22222 labrador
44444 labrador
66666 pug
Name: primary_breed, dtype: int64
将mode
用于最常见的:
(df.loc[df['species']=='dog']
.groupby('zip_code')['primary_breed']
.agg(lambda x: x.mode()[0])
)
输出:
zip_code
12345 labrador
22222 labrador
44444 labrador
66666 pug
Name: primary_breed, dtype: object