从ESPN NBA Box Score中删除重复的名称输出



我所要做的就是将NBA比赛的得分转换为熊猫数据帧。不幸的是,从球员名字的角度来看,我的输出有点奇怪。我正在使用以下代码。。。

import pandas as pd
box_score = pd.read_html('https://www.espn.com/nba/boxscore/_/gameId/401307851')
score = box_score[0]
away = box_score[1]
home = box_score[2]
score.columns = ['Team', '1', '2', '3', '4', 'T']
box_final = pd.concat([away, home])
box_final.columns = ['Player', 'MIN', 'FG', '3PT', 'FT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PF', '+/-', 'PTS']
#box_final = box_final[box_final['MIN'] != "Has not entered game"]
box_final = box_final[box_final['MIN'] != "DNP-COACH'S DECISION"]
box_final = box_final[box_final['Player'].notna()]
box_final['FG Made'] = box_final['FG'].str.split('-').str[0]
box_final['FG Att'] = box_final['FG'].str.split('-').str[1]
box_final['3PT Made'] = box_final['3PT'].str.split('-').str[0]
box_final['3PT Att'] = box_final['3PT'].str.split('-').str[1]
box_final['FT Made'] = box_final['FT'].str.split('-').str[0]
box_final['FT Att'] = box_final['FT'].str.split('-').str[1]
print(box_final)

以获得以下输出。

Player  MIN      FG  ... 3PT Att FT Made FT Att
0     R. HachimuraR. HachimuraPF   30    5-11  ...       2       0      0
1                  A. LenA. LenC   24     2-6  ...       0       2      2
2     R. WestbrookR. WestbrookPG   40   12-28  ...       9       6      8
3               R. NetoR. NetoPG   29    5-10  ...       2       4      4
4          G. MathewsG. MathewsG   18     1-4  ...       4       0      0
5      C. HutchisonC. HutchisonF   16     2-5  ...       1       4      5
6         D. BertansD. BertansSF   31    5-12  ...      10       0      0
7              R. LopezR. LopezC    7     2-5  ...       0       0      0
8          D. GaffordD. GaffordC   17    8-11  ...       0       0      0
9             I. SmithI. SmithPG   28     3-9  ...       1       0      0
13                          TEAM  NaN  45-101  ...      29      16     19
0         J. CollinsJ. CollinsPF   34    5-12  ...       5       6      8
1            C. CapelaC. CapelaC   28     7-7  ...       0       3      5
2             T. YoungT. YoungPG   37   12-25  ...       8       7      7
3   B. BogdanovicB. BogdanovicSG   40    8-15  ...      10       0      0
4         K. HuerterK. HuerterSG   21     2-6  ...       5       1      2
5     D. GallinariD. GallinariPF   23     2-5  ...       2       0      0
6         O. OkongwuO. OkongwuPF   11     5-7  ...       0       1      2
7               S. HillS. HillSF    9     0-2  ...       2       2      2
8             T. SnellT. SnellSF   23     2-2  ...       1       0      0
9       L. WilliamsL. WilliamsSG   13     1-5  ...       0       2      2
15                          TEAM  NaN   44-86  ...      33      22     28
[22 rows x 21 columns]

有什么建议可以防止这个名字重复吗?我们非常感谢您的帮助。

我们可以使用这样的正则表达式删除第一个名称,

box_final.Player.apply(lambda x: re.sub("[s].*[s]", "", x))
0      R.HachimuraPF
1             A.LenC
2      R.WestbrookPG
3           R.NetoPG
4         G.MathewsG
5       C.HutchisonF
6        D.BertansSF
7           R.LopezC
8         D.GaffordC
9          I.SmithPG
10              TEAM
11       J.CollinsPF
12         C.CapelaC
13         T.YoungPG
14    B.BogdanovicSG
15       K.HuerterSG
16     D.GallinariPF
17       O.OkongwuPF
18          S.HillSF
19         T.SnellSF
20      L.WilliamsSG
21              TEAM
Name: Player, dtype: object

希望这对你有用。

分离位置

您可以拆分为1列,

a = box_final.Player.apply(lambda x: re.sub('(\s{2,})', '',' '.join(re.split('([A-Z]{0,2})([A-Z]{0,1}$)',re.sub("[s].*[s]", "", x)))))
a = a.iloc[np.where(a != ('T EA M'))]
a
0      R.Hachimura PF
1             A.Len C
2      R.Westbrook PG
3           R.Neto PG
4         G.Mathews G
5       C.Hutchison F
6        D.Bertans SF
7           R.Lopez C
8         D.Gafford C
9          I.Smith PG
11       J.Collins PF
12         C.Capela C
13         T.Young PG
14    B.Bogdanovic SG
15       K.Huerter SG
16     D.Gallinari PF
17       O.Okongwu PF
18          S.Hill SF
19         T.Snell SF
20      L.Williams SG
Name: Player, dtype: object

或创建一个新的Position列,

pd.DataFrame(list(a.str.split(' ')), columns=['Player', 'Position'])
Player  Position
0   R.Hachimura PF
1   A.Len   C
2   R.Westbrook PG
3   R.Neto  PG
4   G.Mathews   G
5   C.Hutchison F
6   D.Bertans   SF
7   R.Lopez C
8   D.Gafford   C
9   I.Smith PG
10  J.Collins   PF
11  C.Capela    C
12  T.Young PG
13  B.Bogdanovic    SG
14  K.Huerter   SG
15  D.Gallinari PF
16  O.Okongwu   PF
17  S.Hill  SF
18  T.Snell SF
19  L.Williams  SG

我想第二个就是你想要的。

Roach的解决方案很棒,我一直喜欢使用panda来解析表,因为它使用的代码最少。因此,将其作为可接受的解决方案,但希望提供使用espnApi的替代方案。

这将以json格式返回数据。Pandas还允许您将json转换为表/数据帧。但有了api,它将更加健壮,因为a(结构可能不会改变(因为html可能会改变(。您将获得所有原始数据,并且很可能需要进行最少的数据/字符串操作。

import requests
import pandas as pd
url = 'https://secure.espn.com/core/nba/boxscore/_/gameId/401307851'
payload = {'xhr':1}
jsonData = requests.get(url, params=payload).json()
boxScoreData = jsonData['gamepackageJSON']['boxscore']['players']
rows = []
for each in boxScoreData:
statistics = each['statistics'][0]
for athlete in statistics['athletes']:
data = pd.json_normalize(athlete).drop('stats', axis=1).to_dict('records')
if len(athlete['stats']) > 0:
stats = pd.DataFrame([athlete['stats']], columns=statistics['names']).to_dict('records')
else:
stats = [{}]

data[0].update(stats[0])
rows += data
df = pd.DataFrame(rows)

输出:*仅显示前5行。

print(df.head(5).to_string())
reason  starter  ejected  didNotPlay  active          athlete.uid athlete.displayName athlete.headshot.alt                                           athlete.headshot.href athlete.jersey                      athlete.guid                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             athlete.links athlete.id athlete.position.displayName athlete.position.name athlete.position.abbreviation athlete.shortName MIN     FG  3PT   FT OREB DREB REB AST STL BLK TO PF +/- PTS
0  COACH'S DECISION     True    False       False   False  s:40~l:46~a:4066648       Rui Hachimura        Rui Hachimura  https://a.espncdn.com/i/headshots/nba/players/full/4066648.png              8  40c1bcf6675bf217f97c1d628073f927                          [{'href': 'https://www.espn.com/nba/player/_/id/4066648/rui-hachimura', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/4066648/rui-hachimura', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/4066648/rui-hachimura', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/4066648/rui-hachimura', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/4066648/rui-hachimura', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/4066648/rui-hachimura', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/4066648/rui-hachimura', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/4066648/rui-hachimura', 'text': 'Advanced Stats'}]    4066648                Power Forward         Power Forward                            PF      R. Hachimura  30   5-11  1-2  0-0    2    4   6   3   0   0  0  2  -2  11
1  COACH'S DECISION     True    False       False    True  s:40~l:46~a:2596107            Alex Len             Alex Len  https://a.espncdn.com/i/headshots/nba/players/full/2596107.png             27  56df8855ab0a659aeed2ccb77a2b77f7                                                                  [{'href': 'https://www.espn.com/nba/player/_/id/2596107/alex-len', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/2596107/alex-len', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/2596107/alex-len', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/2596107/alex-len', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/2596107/alex-len', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/2596107/alex-len', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/2596107/alex-len', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/2596107/alex-len', 'text': 'Advanced Stats'}]    2596107                       Center                Center                             C            A. Len  24    2-6  0-0  2-2    3    7  10   2   2   1  0  2  +0   6
2  COACH'S DECISION     True    False       False    True     s:40~l:46~a:3468   Russell Westbrook    Russell Westbrook     https://a.espncdn.com/i/headshots/nba/players/full/3468.png              4  e849e50fb1b742561de2ca49862e218d                  [{'href': 'https://www.espn.com/nba/player/_/id/3468/russell-westbrook', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/3468/russell-westbrook', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/3468/russell-westbrook', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/3468/russell-westbrook', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/3468/russell-westbrook', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/3468/russell-westbrook', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/3468/russell-westbrook', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/3468/russell-westbrook', 'text': 'Advanced Stats'}]       3468                  Point Guard           Point Guard                            PG      R. Westbrook  40  12-28  4-9  6-8    1    4   5  15   3   0  4  3  -4  34
3  COACH'S DECISION     True    False       False    True  s:40~l:46~a:2968361           Raul Neto            Raul Neto  https://a.espncdn.com/i/headshots/nba/players/full/2968361.png             19  a8463c665dc5f3c3f84682a998120b9f                                                          [{'href': 'https://www.espn.com/nba/player/_/id/2968361/raul-neto', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/2968361/raul-neto', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/2968361/raul-neto', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/2968361/raul-neto', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/2968361/raul-neto', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/2968361/raul-neto', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/2968361/raul-neto', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/2968361/raul-neto', 'text': 'Advanced Stats'}]    2968361                  Point Guard           Point Guard                            PG           R. Neto  29   5-10  0-2  4-4    1    2   3   2   0   0  0  5  -3  14
4  COACH'S DECISION     True    False       False   False  s:40~l:46~a:3913180    Garrison Mathews     Garrison Mathews  https://a.espncdn.com/i/headshots/nba/players/full/3913180.png             24  fdeab1798e3a6206a336412ef2916015  [{'href': 'https://www.espn.com/nba/player/_/id/3913180/garrison-mathews', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/3913180/garrison-mathews', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/3913180/garrison-mathews', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/3913180/garrison-mathews', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/3913180/garrison-mathews', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/3913180/garrison-mathews', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/3913180/garrison-mathews', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/3913180/garrison-mathews', 'text': 'Advanced Stats'}]    3913180                        Guard                 Guard                             G        G. Mathews  18    1-4  1-4  0-0    0    0   0   0   1   0  0  2  +4   3

相关内容

  • 没有找到相关文章