遍历链接列表并使用Selenium进行抓取

当我尝试通过链接列表迭代并使用Selenium访问它们时，代码为：

# create link list
urlList = []
with open('my.txt','r') as f: 
for i in f:
    urlList.append(i)

# navigate to URL 
for i in (urlList):
    getUrl = driver.get(i)
    driver.implicitly_wait(3)

我收到这个错误：

selenium.com.mon.exceptions.WebDriverException：消息：未知错误：未处理的检查器错误：｛"代码"：-32603，"消息"："无法导航到无效URL"｝(会话信息：chrome=51.0.2704.106((驱动程序信息：chromedriver=2.9.248304，平台=Linux 4.2.0-16-generic x86_64

显然，for循环从列表中生成换行符，并将它们输入driver.get方法。我如何让它转而提供URL？

如果从文件中读取的url中混合了换行符，请尝试：

with open('my.txt','r') as f: 
    for i in f:
        urlList.append(i.strip())

这将从每个i中删除前导和尾随空白。此外，n s不是由循环生成的，它们存在于文件中，该文件可能每行都有一个url，每行末尾都有'n'。

我在电脑上运行你的程序，但我没有收到任何错误？

这是my.txt文件，里面有两个中国网站的网址：

https://www.baidu.com/
https://www.sogou.com/

这是test.py文件，它将get中的网站my.txt:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import time
from selenium import webdriver
driver = webdriver.Chrome()  # Optional argument, if not specified will search path.
urlList = []
with open('my.txt', 'r') as f:
    for i in f:
        urlList.append(i)

for i in (urlList):
    print(i)
    getUrl = driver.get(i)
    time.sleep(3)
    driver.implicitly_wait(3)

这些是我程序的输出：

➜ /tmp/selenium $ python3 test.py
https://www.baidu.com/
https://www.sogou.com/

所以我认为你的程序中可能还有其他错误。你能展示my.txt的内容和完整的代码吗？

相关内容

最新更新

热门标签：