所以我正在从https://www.pbpstats.com/totals/nba/player.我使用硒与铬网络驱动程序。我不太明白如何点击";获取统计数据";。我可以手动完成这项工作,但希望通过html和selenium来完成。
尝试过这个:
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
element = browser.find_elements_by_tag_name('button')
element.click()
但什么也没发生。我不知道如何理解find_elements_by_tag_name的输出。获取类似于";该硒.webdriver.mote.webelement.webelement(会话="14bacd9bab4b484952ba872ea0373663",元素="4ef4e9da-b193-46a8-8209-265b8bef3f05"(等号后不同(
尝试一下,应该会有所帮助:
import time
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
time.sleep(4)
element = browser.find_element_by_xpath("//button[text()='Get Stats']")
element.click()
或者你可以使用显式等待,比如:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
timeout = 30
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
myElem = WebDriverWait(browser, timeout).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Get Stats']")))
myElem.click()
Selenium在这里有点过头了,因为数据是从api返回的。只要从那里获取数据。你还可以获得所有数据,而不必浏览得分、助攻、篮板等的每个下拉列表(所有248列(
如果你想要每场比赛和/或每100次控球,那么只要有了数据帧,就可以用数字int列除以'GP'
或'Possessions' * 100
列。
import requests
import pandas as pd
url = 'https://api.pbpstats.com/get-totals/nba'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
payload = {
'Season': '2020-21',
'SeasonType': 'Regular+Season',
'Type': 'Player'}
jsonData = requests.get(url, headers=headers, params=payload).json()
df = pd.DataFrame(jsonData['multi_row_table_data'])
df.to_csv('pbpstats_export.csv', index=False)
数据将按字母顺序随列一起提供,所以如果您想在写入文件之前移动它们,可以先放置要放在前面的列。我只选择名字和球队,因为它们通常是体育表格中的前两列:
# If you want to reorder the first few columns. Otherwise columns are i alpha order
reorder = ['Name','TeamAbbreviation']
for col in reversed(reorder):
col = df.pop(col)
df.insert(0, col.name, col)
输出:
print(df)
Name TeamAbbreviation 2pt And 1 Free Throw Trips
0 Julius Randle NYK 35.0
1 Nikola Jokic DEN 24.0
2 Buddy Hield SAC 3.0
3 Domantas Sabonis IND 32.0
4 RJ Barrett NYK 21.0
.. ... ... ...
495 Jontay Porter MEM NaN
496 Ty-Shon Alexander PHX NaN
497 Rayjon Tucker PHI NaN
498 Brian Bowen II IND 1.0
499 Jared Harper NYK NaN
Arc3Accuracy Arc3Assists ... BlockedCorner3 Period3Fouls5Minutes
0 0.408284 90.0 ... NaN NaN
1 0.422360 87.0 ... NaN NaN
2 0.365796 22.0 ... NaN NaN
3 0.281818 101.0 ... NaN NaN
4 0.318681 32.0 ... NaN NaN
.. ... ... ... ... ...
495 0.500000 NaN ... NaN NaN
496 NaN NaN ... NaN NaN
497 1.000000 NaN ... NaN NaN
498 NaN NaN ... NaN NaN
499 NaN NaN ... NaN NaN
HeaveMakes Period1Fouls3Minutes Period2Fouls4Minutes
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
.. ... ... ...
495 NaN NaN NaN
496 NaN NaN NaN
497 NaN NaN NaN
498 NaN NaN NaN
499 NaN NaN NaN
[500 rows x 248 columns]
您可以执行这样的点击:
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
time.sleep(5)
element = browser.find_elements_by_xpath('//*
[@id="totals"]/main/div[3]/div/button[1]')[0]
element.click()