在Pandas中较高的平均值高于应有的值

我试图将一些NFL脑震荡统计数据与一些个人球员的统计数据进行比较。

dfcomb.to_excel(r'C:UsersDocumentsGWGNFL ConcussionNFL_concussioncomb.xlsx', index=False)
# Create a merged df with players that are concussed on dfconc and players that are on dfcomb
dfcommon = dfcomb.merge(dfconc, on=['nameFull'])
dfcommon = pd.read_csv(r'C:Userscrae1DocumentsGWGNFL ConcussionNFL_concussioncommon.csv')
# Initialise list of pos
positions = ['C', 'RB', 'CB', 'LB', 'OG', 'OT', 'QB', 'DT', 'S', 'FB', 'WR', 'TE']
# Iterate through list and compare height and weight 
for pos in positions:
avg = np.mean(dfcomb['heightInches'].where(dfcomb['position'] == pos))
avgconc = np.mean(dfcommon['heightInches'].where(dfcommon['position'] == pos))
print('mean height in the NFL for {}s is {} in mean height of concussed players {} in'.format(pos + ''', avg, avgconc))
for pos in positions:
avg = np.mean(dfcomb['weight'].where(dfcomb['position'] == pos))
avgconc = np.mean(dfcommon['weight'].where(dfcommon['position'] == pos))
print('mean weight in the NFL for {}s is {} lbs mean weight of concussed players {} lbs'.format(pos +    ''', avg, avgconc))
# Create summary df for concussion and NFL groups
heightavgNFL = dfcomb.groupby('position')['heightInches'].mean
heightavgdf = dfcommon.groupby('position')['heightInches'].mean
weightavgNFL = dfcomb.groupby('position')['weight'].mean
weightavgdf = dfcommon.groupby('position')['weight'].mean

# Plot height
bar_width = 0.10
ax = heightavgNFL().plot(kind='bar', align='edge', title='Mean NFL Height vs Mean Concussed Height', ylabel='Height (in)', xlabel='Position', width=bar_width, figsize=(16,8), color='r',label='NFL')
heightavgdf().plot(kind='bar', ax=ax, align='edge', title='Mean NFL Height vs Mean Concussed Height', ylabel='Height (in)', xlabel='Position', width=-bar_width, figsize=(16,8), color='b',label='Concussion Group')
plt.legend(loc='lower right')
# Plot weight
bar_width = 0.10
ax = weightavgNFL().plot(kind='bar', align='edge', title='Mean NFL Weight vs Mean Concussed Weight',  ylabel='Weight (lbs)', xlabel='Position', width=bar_width, figsize=(16,8), color='r',label='NFL')
weightavgdf().plot(kind='bar', ax=ax, align='edge', title='Mean NFL Weight vs Mean Concussed Weight',     ylabel='Weight (lbs)', xlabel='Position', width=-bar_width, figsize=(16,8), color='b',label='Concussion Group')
plt.legend(loc='lower right')

然而，当从组合csv文件中查看QB权重时，权重比预期的要高得多，并且这个问题只发生在QB位置上。我已经看了一遍数据，我不知道它可以从哪里得到更高的值。

QB权重高于预期

下面是dfcomb来自的csv/xlsx文件:https://1drv.ms/x/s ! AnmdeJC_g0dLnGzx9LplOkq8iYNJ ? e = 2 na3wa

感谢

groupby方法不返回排序的数据，因此当您将NFL和df数据添加到图中时，它们的顺序不同。试试这个:

heightavgNFL = pd.DataFrame(dfcomb.groupby('position')['heightInches'].mean()).sort_values(by=['position'])
heightavgdf = pd.DataFrame(dfcommon.groupby('position')['heightInches'].mean()).sort_values(by=['position'])
weightavgNFL = pd.DataFrame(dfcomb.groupby('position')['weight'].mean()).sort_values(by=['position'])
weightavgdf = pd.DataFrame(dfcommon.groupby('position')['weight'].mean()).sort_values(by=['position'])

这个对我有效:

pd.DataFrame(weightavgNFL).join(weightavgdf, lsuffix='_NFL', rsuffix='_df').plot(kind='bar', align='edge', title='Mean NFL Weight vs Mean Concussed Weight',  ylabel='Weight (lbs)', xlabel='Position', width=0.1, figsize=(16,8))

相关内容

最新更新

热门标签：