取消NBA.com个人球员对决统计页面(覆盖多个页面)



我正在尝试使用Python抓取以下页面(目前正在尝试使用Requests&BeautifulSoup(,但很难获得a(表格格式的有意义的结果,b(从每个页面抓取,因为大多数玩家的数据覆盖了不同的页面(例如,以下玩家的数据跨越了7页:https://www.nba.com/stats/player/203081/head-to-head/(

目前,我已经能够成功地运行GET&SOUP请求,但不确定最佳方式。非常感谢任何帮助/建议/建议。

url = 'https://www.nba.com/stats/player/203081/head-to-head/'
r = requests.get(url)
if r.status_code==200:
soup = BeautifulSoup(r.content, 'html.parser')
print(soup)
table = soup.find('table')
if table:
df = pd.read_html(str(table))[0]
print(df)

我访问了浏览器中的页面并记录了我的网络流量,发现我的浏览器向RESTAPI发出了几个HTTPGET请求。其中一个具有端点stats/leagueseasonmatchups,您可以查询特定的球员、联赛和赛季。响应是JSON,其中包含您试图抓取的所有表信息。通常,页面使用此API来使用JavaScript异步填充DOM。由于我们知道端点、查询字符串参数和请求头,我们可以模拟HTTPGET请求,解析响应,并将其写入CSV:

def get_matchups():
import requests
url = "https://stats.nba.com/stats/leagueseasonmatchups"
params = {
"DateFrom": "",
"DateTo": "",
"DefPlayerID": "203081",
"LeagueID": "00",
"Outcome": "",
"PORound": "0",
"PerMode": "Totals",
"Season": "2020-21",
"SeasonType": "Regular Season"
}
headers = {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Referer": "https://www.nba.com/",
"User-Agent": "Mozilla/5.0",
"x-nba-stats-origin": "stats",
"x-nba-stats-token": "true"
}
print("Getting matchups for player ID# {}...".format(params["DefPlayerID"]))
response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
data = response.json()

fieldnames = data["resultSets"][0]["headers"]
for row in data["resultSets"][0]["rowSet"]:
yield dict(zip(fieldnames, row))
def main():
from csv import DictWriter
all_matchups = list(get_matchups())
print("Writing to CSV file...")
with open("output.csv", "w", newline="") as file:
fieldnames = list(all_matchups[0]) # a bit lame
writer = DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for matchup in all_matchups:
writer.writerow(matchup)
print("Done.")
return 0

if __name__ == "__main__":
import sys
sys.exit(main())

输出(终端(:

Getting matchups for player ID# 203081...
Writing to CSV file...
Done.
>>> 

输出(CSV(:

SEASON_ID,OFF_PLAYER_ID,OFF_PLAYER_NAME,DEF_PLAYER_ID,DEF_PLAYER_NAME,GP,MATCHUP_MIN,PARTIAL_POSS,PLAYER_PTS,TEAM_PTS,MATCHUP_AST,MATCHUP_TOV,MATCHUP_BLK,MATCHUP_FGM,MATCHUP_FGA,MATCHUP_FG_PCT,MATCHUP_FG3M,MATCHUP_FG3A,MATCHUP_FG3_PCT,HELP_BLK,HELP_FGM,HELP_FGA,HELP_FG_PERC,MATCHUP_FTM,MATCHUP_FTA,SFL
22020,202709,Cory Joseph,203081,Damian Lillard,5,17:34,68.6,4,82,1,1,0,2,10,0.2,0,3,0.0,0,0,0,0.0,0,0,0
22020,1628969,Mikal Bridges,203081,Damian Lillard,3,17:28,68.36,18,98,4,1,0,7,8,0.875,3,4,0.75,0,0,0,0.0,1,1,1
22020,1628366,Lonzo Ball,203081,Damian Lillard,3,16:34,65.98,17,77,6,2,1,6,13,0.462,5,11,0.455,0,0,0,0.0,0,0,0
22020,1626220,Royce O'Neale,203081,Damian Lillard,3,14:17,51.4,2,77,0,1,0,1,6,0.167,0,4,0.0,0,0,0,0.0,0,0,0
22020,1626196,Josh Richardson,203081,Damian Lillard,3,11:39,47.9,6,80,2,1,0,2,4,0.5,1,1,1.0,0,0,0,0.0,1,1,1
...

最新更新