如何使用硒色粉多网址



我正在构建一个从网站中提取数据的scraper。然而,我想刮多个链接,而不仅仅是一个。然而,当我试图在一个chromedriver上使用两个url时,我遇到了一个错误。我也试着把它放在单独的文件中,但这也不起作用。有办法做到这一点吗?

from selenium import webdriver
import os
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
wd = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
url = "https://www.google.com"
wd.get(url)
table = WebDriverWait(wd, 5).until(EC.presence_of_element_located((By.ID, 'live-table')))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')

url2 = "https://www.flashscores.com"
wd.get(url2)
table2 = WebDriverWait(wd, 5).until(EC.presence_of_element_located((By.ID, 'live-table')))
soup2 = BeautifulSoup(table2.get_attribute('innerHTML'), 'lxml')

也许您可以尝试使用for循环来迭代URL列表

def main(url):
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
wd = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
wd.get(url)
table = WebDriverWait(wd, 5).until(EC.presence_of_element_located((By.ID, 'live-table')))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')
urls = ["google.com","facebook.com","instagram.com"]
for i in urls:
main(i)

相关内容

  • 没有找到相关文章

最新更新