我所要做的就是将NBA比赛的得分转换为熊猫数据帧。不幸的是,从球员名字的角度来看,我的输出有点奇怪。我正在使用以下代码。。。
import pandas as pd
box_score = pd.read_html('https://www.espn.com/nba/boxscore/_/gameId/401307851')
score = box_score[0]
away = box_score[1]
home = box_score[2]
score.columns = ['Team', '1', '2', '3', '4', 'T']
box_final = pd.concat([away, home])
box_final.columns = ['Player', 'MIN', 'FG', '3PT', 'FT', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PF', '+/-', 'PTS']
#box_final = box_final[box_final['MIN'] != "Has not entered game"]
box_final = box_final[box_final['MIN'] != "DNP-COACH'S DECISION"]
box_final = box_final[box_final['Player'].notna()]
box_final['FG Made'] = box_final['FG'].str.split('-').str[0]
box_final['FG Att'] = box_final['FG'].str.split('-').str[1]
box_final['3PT Made'] = box_final['3PT'].str.split('-').str[0]
box_final['3PT Att'] = box_final['3PT'].str.split('-').str[1]
box_final['FT Made'] = box_final['FT'].str.split('-').str[0]
box_final['FT Att'] = box_final['FT'].str.split('-').str[1]
print(box_final)
以获得以下输出。
Player MIN FG ... 3PT Att FT Made FT Att
0 R. HachimuraR. HachimuraPF 30 5-11 ... 2 0 0
1 A. LenA. LenC 24 2-6 ... 0 2 2
2 R. WestbrookR. WestbrookPG 40 12-28 ... 9 6 8
3 R. NetoR. NetoPG 29 5-10 ... 2 4 4
4 G. MathewsG. MathewsG 18 1-4 ... 4 0 0
5 C. HutchisonC. HutchisonF 16 2-5 ... 1 4 5
6 D. BertansD. BertansSF 31 5-12 ... 10 0 0
7 R. LopezR. LopezC 7 2-5 ... 0 0 0
8 D. GaffordD. GaffordC 17 8-11 ... 0 0 0
9 I. SmithI. SmithPG 28 3-9 ... 1 0 0
13 TEAM NaN 45-101 ... 29 16 19
0 J. CollinsJ. CollinsPF 34 5-12 ... 5 6 8
1 C. CapelaC. CapelaC 28 7-7 ... 0 3 5
2 T. YoungT. YoungPG 37 12-25 ... 8 7 7
3 B. BogdanovicB. BogdanovicSG 40 8-15 ... 10 0 0
4 K. HuerterK. HuerterSG 21 2-6 ... 5 1 2
5 D. GallinariD. GallinariPF 23 2-5 ... 2 0 0
6 O. OkongwuO. OkongwuPF 11 5-7 ... 0 1 2
7 S. HillS. HillSF 9 0-2 ... 2 2 2
8 T. SnellT. SnellSF 23 2-2 ... 1 0 0
9 L. WilliamsL. WilliamsSG 13 1-5 ... 0 2 2
15 TEAM NaN 44-86 ... 33 22 28
[22 rows x 21 columns]
有什么建议可以防止这个名字重复吗?我们非常感谢您的帮助。
我们可以使用这样的正则表达式删除第一个名称,
box_final.Player.apply(lambda x: re.sub("[s].*[s]", "", x))
0 R.HachimuraPF
1 A.LenC
2 R.WestbrookPG
3 R.NetoPG
4 G.MathewsG
5 C.HutchisonF
6 D.BertansSF
7 R.LopezC
8 D.GaffordC
9 I.SmithPG
10 TEAM
11 J.CollinsPF
12 C.CapelaC
13 T.YoungPG
14 B.BogdanovicSG
15 K.HuerterSG
16 D.GallinariPF
17 O.OkongwuPF
18 S.HillSF
19 T.SnellSF
20 L.WilliamsSG
21 TEAM
Name: Player, dtype: object
希望这对你有用。
分离位置
您可以拆分为1列,
a = box_final.Player.apply(lambda x: re.sub('(\s{2,})', '',' '.join(re.split('([A-Z]{0,2})([A-Z]{0,1}$)',re.sub("[s].*[s]", "", x)))))
a = a.iloc[np.where(a != ('T EA M'))]
a
0 R.Hachimura PF
1 A.Len C
2 R.Westbrook PG
3 R.Neto PG
4 G.Mathews G
5 C.Hutchison F
6 D.Bertans SF
7 R.Lopez C
8 D.Gafford C
9 I.Smith PG
11 J.Collins PF
12 C.Capela C
13 T.Young PG
14 B.Bogdanovic SG
15 K.Huerter SG
16 D.Gallinari PF
17 O.Okongwu PF
18 S.Hill SF
19 T.Snell SF
20 L.Williams SG
Name: Player, dtype: object
或创建一个新的Position
列,
pd.DataFrame(list(a.str.split(' ')), columns=['Player', 'Position'])
Player Position
0 R.Hachimura PF
1 A.Len C
2 R.Westbrook PG
3 R.Neto PG
4 G.Mathews G
5 C.Hutchison F
6 D.Bertans SF
7 R.Lopez C
8 D.Gafford C
9 I.Smith PG
10 J.Collins PF
11 C.Capela C
12 T.Young PG
13 B.Bogdanovic SG
14 K.Huerter SG
15 D.Gallinari PF
16 O.Okongwu PF
17 S.Hill SF
18 T.Snell SF
19 L.Williams SG
我想第二个就是你想要的。
Roach的解决方案很棒,我一直喜欢使用panda来解析表,因为它使用的代码最少。因此,将其作为可接受的解决方案,但希望提供使用espnApi的替代方案。
这将以json格式返回数据。Pandas还允许您将json转换为表/数据帧。但有了api,它将更加健壮,因为a(结构可能不会改变(因为html可能会改变(。您将获得所有原始数据,并且很可能需要进行最少的数据/字符串操作。
import requests
import pandas as pd
url = 'https://secure.espn.com/core/nba/boxscore/_/gameId/401307851'
payload = {'xhr':1}
jsonData = requests.get(url, params=payload).json()
boxScoreData = jsonData['gamepackageJSON']['boxscore']['players']
rows = []
for each in boxScoreData:
statistics = each['statistics'][0]
for athlete in statistics['athletes']:
data = pd.json_normalize(athlete).drop('stats', axis=1).to_dict('records')
if len(athlete['stats']) > 0:
stats = pd.DataFrame([athlete['stats']], columns=statistics['names']).to_dict('records')
else:
stats = [{}]
data[0].update(stats[0])
rows += data
df = pd.DataFrame(rows)
输出:*仅显示前5行。
print(df.head(5).to_string())
reason starter ejected didNotPlay active athlete.uid athlete.displayName athlete.headshot.alt athlete.headshot.href athlete.jersey athlete.guid athlete.links athlete.id athlete.position.displayName athlete.position.name athlete.position.abbreviation athlete.shortName MIN FG 3PT FT OREB DREB REB AST STL BLK TO PF +/- PTS
0 COACH'S DECISION True False False False s:40~l:46~a:4066648 Rui Hachimura Rui Hachimura https://a.espncdn.com/i/headshots/nba/players/full/4066648.png 8 40c1bcf6675bf217f97c1d628073f927 [{'href': 'https://www.espn.com/nba/player/_/id/4066648/rui-hachimura', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/4066648/rui-hachimura', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/4066648/rui-hachimura', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/4066648/rui-hachimura', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/4066648/rui-hachimura', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/4066648/rui-hachimura', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/4066648/rui-hachimura', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/4066648/rui-hachimura', 'text': 'Advanced Stats'}] 4066648 Power Forward Power Forward PF R. Hachimura 30 5-11 1-2 0-0 2 4 6 3 0 0 0 2 -2 11
1 COACH'S DECISION True False False True s:40~l:46~a:2596107 Alex Len Alex Len https://a.espncdn.com/i/headshots/nba/players/full/2596107.png 27 56df8855ab0a659aeed2ccb77a2b77f7 [{'href': 'https://www.espn.com/nba/player/_/id/2596107/alex-len', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/2596107/alex-len', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/2596107/alex-len', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/2596107/alex-len', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/2596107/alex-len', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/2596107/alex-len', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/2596107/alex-len', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/2596107/alex-len', 'text': 'Advanced Stats'}] 2596107 Center Center C A. Len 24 2-6 0-0 2-2 3 7 10 2 2 1 0 2 +0 6
2 COACH'S DECISION True False False True s:40~l:46~a:3468 Russell Westbrook Russell Westbrook https://a.espncdn.com/i/headshots/nba/players/full/3468.png 4 e849e50fb1b742561de2ca49862e218d [{'href': 'https://www.espn.com/nba/player/_/id/3468/russell-westbrook', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/3468/russell-westbrook', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/3468/russell-westbrook', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/3468/russell-westbrook', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/3468/russell-westbrook', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/3468/russell-westbrook', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/3468/russell-westbrook', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/3468/russell-westbrook', 'text': 'Advanced Stats'}] 3468 Point Guard Point Guard PG R. Westbrook 40 12-28 4-9 6-8 1 4 5 15 3 0 4 3 -4 34
3 COACH'S DECISION True False False True s:40~l:46~a:2968361 Raul Neto Raul Neto https://a.espncdn.com/i/headshots/nba/players/full/2968361.png 19 a8463c665dc5f3c3f84682a998120b9f [{'href': 'https://www.espn.com/nba/player/_/id/2968361/raul-neto', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/2968361/raul-neto', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/2968361/raul-neto', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/2968361/raul-neto', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/2968361/raul-neto', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/2968361/raul-neto', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/2968361/raul-neto', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/2968361/raul-neto', 'text': 'Advanced Stats'}] 2968361 Point Guard Point Guard PG R. Neto 29 5-10 0-2 4-4 1 2 3 2 0 0 0 5 -3 14
4 COACH'S DECISION True False False False s:40~l:46~a:3913180 Garrison Mathews Garrison Mathews https://a.espncdn.com/i/headshots/nba/players/full/3913180.png 24 fdeab1798e3a6206a336412ef2916015 [{'href': 'https://www.espn.com/nba/player/_/id/3913180/garrison-mathews', 'text': 'Player Card'}, {'href': 'http://www.espn.com/nba/player/stats/_/id/3913180/garrison-mathews', 'text': 'Stats'}, {'href': 'http://www.espn.com/nba/player/splits/_/id/3913180/garrison-mathews', 'text': 'Splits'}, {'href': 'http://www.espn.com/nba/player/gamelog/_/id/3913180/garrison-mathews', 'text': 'Game Log'}, {'href': 'http://www.espn.com/nba/player/news/_/id/3913180/garrison-mathews', 'text': 'News'}, {'href': 'http://www.espn.com/nba/player/bio/_/id/3913180/garrison-mathews', 'text': 'Bio'}, {'href': 'http://www.espn.com/nba/player/_/id/3913180/garrison-mathews', 'text': 'Overview'}, {'href': 'http://www.espn.com/nba/player/advancedstats/_/id/3913180/garrison-mathews', 'text': 'Advanced Stats'}] 3913180 Guard Guard G G. Mathews 18 1-4 1-4 0-0 0 0 0 0 1 0 0 2 +4 3