我试图将一些NFL脑震荡统计数据与一些个人球员的统计数据进行比较。
dfcomb.to_excel(r'C:UsersDocumentsGWGNFL ConcussionNFL_concussioncomb.xlsx', index=False)
# Create a merged df with players that are concussed on dfconc and players that are on dfcomb
dfcommon = dfcomb.merge(dfconc, on=['nameFull'])
dfcommon = pd.read_csv(r'C:Userscrae1DocumentsGWGNFL ConcussionNFL_concussioncommon.csv')
# Initialise list of pos
positions = ['C', 'RB', 'CB', 'LB', 'OG', 'OT', 'QB', 'DT', 'S', 'FB', 'WR', 'TE']
# Iterate through list and compare height and weight
for pos in positions:
avg = np.mean(dfcomb['heightInches'].where(dfcomb['position'] == pos))
avgconc = np.mean(dfcommon['heightInches'].where(dfcommon['position'] == pos))
print('mean height in the NFL for {}s is {} in mean height of concussed players {} in'.format(pos + ''', avg, avgconc))
for pos in positions:
avg = np.mean(dfcomb['weight'].where(dfcomb['position'] == pos))
avgconc = np.mean(dfcommon['weight'].where(dfcommon['position'] == pos))
print('mean weight in the NFL for {}s is {} lbs mean weight of concussed players {} lbs'.format(pos + ''', avg, avgconc))
# Create summary df for concussion and NFL groups
heightavgNFL = dfcomb.groupby('position')['heightInches'].mean
heightavgdf = dfcommon.groupby('position')['heightInches'].mean
weightavgNFL = dfcomb.groupby('position')['weight'].mean
weightavgdf = dfcommon.groupby('position')['weight'].mean
# Plot height
bar_width = 0.10
ax = heightavgNFL().plot(kind='bar', align='edge', title='Mean NFL Height vs Mean Concussed Height', ylabel='Height (in)', xlabel='Position', width=bar_width, figsize=(16,8), color='r',label='NFL')
heightavgdf().plot(kind='bar', ax=ax, align='edge', title='Mean NFL Height vs Mean Concussed Height', ylabel='Height (in)', xlabel='Position', width=-bar_width, figsize=(16,8), color='b',label='Concussion Group')
plt.legend(loc='lower right')
# Plot weight
bar_width = 0.10
ax = weightavgNFL().plot(kind='bar', align='edge', title='Mean NFL Weight vs Mean Concussed Weight', ylabel='Weight (lbs)', xlabel='Position', width=bar_width, figsize=(16,8), color='r',label='NFL')
weightavgdf().plot(kind='bar', ax=ax, align='edge', title='Mean NFL Weight vs Mean Concussed Weight', ylabel='Weight (lbs)', xlabel='Position', width=-bar_width, figsize=(16,8), color='b',label='Concussion Group')
plt.legend(loc='lower right')
然而,当从组合csv文件中查看QB权重时,权重比预期的要高得多,并且这个问题只发生在QB位置上。我已经看了一遍数据,我不知道它可以从哪里得到更高的值。
QB权重高于预期
下面是dfcomb来自的csv/xlsx文件:https://1drv.ms/x/s ! AnmdeJC_g0dLnGzx9LplOkq8iYNJ ? e = 2 na3wa
感谢groupby
方法不返回排序的数据,因此当您将NFL和df数据添加到图中时,它们的顺序不同。试试这个:
heightavgNFL = pd.DataFrame(dfcomb.groupby('position')['heightInches'].mean()).sort_values(by=['position'])
heightavgdf = pd.DataFrame(dfcommon.groupby('position')['heightInches'].mean()).sort_values(by=['position'])
weightavgNFL = pd.DataFrame(dfcomb.groupby('position')['weight'].mean()).sort_values(by=['position'])
weightavgdf = pd.DataFrame(dfcommon.groupby('position')['weight'].mean()).sort_values(by=['position'])
这个对我有效:
pd.DataFrame(weightavgNFL).join(weightavgdf, lsuffix='_NFL', rsuffix='_df').plot(kind='bar', align='edge', title='Mean NFL Weight vs Mean Concussed Weight', ylabel='Weight (lbs)', xlabel='Position', width=0.1, figsize=(16,8))