playerstats_url = 'https://www.pro-football-reference.com/boxscores/202110100tam.htm'
for week in weeks:
url1 = playerstats_url.format(week)
data1 = requests.get(url1)
with open('player/{}.html'.format(week), 'w+') as f:
f.write(data1.text)
soup = BeautifulSoup(page, 'html.parser')
week1_stats = soup.find('div', 'id':'team_stats')
tam2021 = pd.read_html(str(week1_stats))[0]
我正试图从职业足球参考网站上提取"球队统计"表,但我一直得到"ValueError:找不到表">
这对我有用…
import requests
from bs4 import BeautifulSoup
import pandas as pd
html = requests.get('https://www.pro-football-reference.com/boxscores/202110100tam.htm')
soup = BeautifulSoup(html.text)
stats = soup.find('div', {'id':'all_player_offense'})
pd.read_html(str(stats))
返回。。。
[ Unnamed: 0_level_0 Unnamed: 1_level_0 Passing Rushing Receiving Fumbles
Player Tm Cmp Att Yds TD Int Sk Yds.1 Lng Rate Att Yds TD Lng Tgt Rec Yds TD Lng Fmb FL
0 Jacoby Brissett MIA 27 39 275 2 1 3 13 34 95.6 0 0 0 0 0 0 0 0 0 1 1
1 Myles Gaskin MIA 0 0 0 0 0 0 0 0 NaN 5 25 0 13 10 10 74 2 24 0 0
2 Preston Williams MIA 0 0 0 0 0 0 0 0 NaN 1 7 0 7 5 3 60 0 34 0 0
3 Salvon Ahmed MIA 0 0 0 0 0 0 0 0 NaN 2 5 0 4 3 2 16 0 11 0 0
4 Jaylen Waddle MIA 0 0 0 0 0 0 0 0 NaN 1 2 0 2 6 2 31 0 21 0 0
5 Mike Gesicki MIA 0 0 0 0 0 0 0 0 NaN 0 0 0 0 7 4 43 0 23 0 0
6 Durham Smythe MIA 0 0 0 0 0 0 0 0 NaN 0 0 0 0 3 2 23 0 21 0 0
7 Adam Shaheen MIA 0 0 0 0 0 0 0 0 NaN 0 0 0 0 2 2 15 0 10 0 0
8 Mack Hollins MIA 0 0 0 0 0 0 0 0 NaN 0 0 0 0 2 1 10 0 10 0 0
9 Isaiah Ford MIA 0 0 0 0 0 0 0 0 NaN 0 0 0 0 1 1 3 0 3 0 0
10 NaN NaN Passing Passing Passing Passing Passing Passing Passing Passing Passing Rushing Rushing Rushing Rushing Receiving Receiving Receiving Receiving Receiving Fumbles Fumbles
11 Player Tm Cmp Att Yds TD Int Sk Yds Lng Rate Att Yds TD Lng Tgt Rec Yds TD Lng Fmb FL
12 Tom Brady TAM 30 41 411 5 0 2 15 62 144.4 1 13 0 13 0 0 0 0 0 0 0
13 Blaine Gabbert TAM 3 3 41 0 0 0 0 23 118.7 3 -1 0 0 0 0 0 0 0 0 0
14 Leonard Fournette TAM 0 0 0 0 0 0 0 0 NaN 12 67 1 17 5 4 43 0 16 0 0
15 Ronald Jones II TAM 0 0 0 0 0 0 0 0 NaN 5 21 0 5 1 1 15 0 15 0 0
16 Giovani Bernard TAM 0 0 0 0 0 0 0 0 NaN 4 21 0 17 2 2 14 1 10 0 0
17 Antonio Brown TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 8 7 124 2 62 0 0
18 Mike Evans TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 8 6 113 2 34 0 0
19 Chris Godwin TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 11 7 70 0 18 0 0
20 Tyler Johnson TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 3 3 42 0 19 0 0
21 O.J. Howard TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 3 2 19 0 10 0 0
22 Cameron Brate TAM 0 0 0 0 0 0 0 0 NaN 0 0 0 0 1 1 12 0 12 0 0]
编辑更新的问题
发现使用requests
和bs4
解析后对该表进行了注释。我认为网站上的那个是动态加载的,requests
库无法处理使用JavaScript请求信息的页面。
下面的解决方案运行得很好,但如果你想要动态加载的信息,可以尝试使用这个库:https://pypi.org/project/requests-html/
import requests
from bs4 import BeautifulSoup
import pandas as pd
html = requests.get('https://www.pro-football-reference.com/boxscores/202110100tam.htm')
data = html.text.replace('<!--','').replace('-->','')
soup = BeautifulSoup(data)
stats = soup.find('div', {'id':'div_team_stats'})
pd.read_html(str(stats))
这返回。。。
[ Unnamed: 0 MIA TAM
0 First Downs 17 33
1 Rush-Yds-TDs 9-39-0 25-121-1
2 Cmp-Att-Yd-TD-INT 27-39-275-2-1 33-44-452-5-0
3 Sacked-Yards 3-13 2-15
4 Net Pass Yards 262 437
5 Total Yards 301 558
6 Fumbles-Lost 1-1 0-0
7 Turnovers 2 0
8 Penalties-Yards 5-37 6-47
9 Third Down Conv. 2-7 8-11
10 Fourth Down Conv. 0-0 0-0
11 Time of Possession 22:53 37:07]