我正在从内部站点提取组 ID。
它从位于我的桌面中的csv文件中获取的URL。
以下代码当前提取组 ID 没有任何问题,直到并且除非 URL 有效。
但是我想运行这段代码直到最后,即使csv文件中有无效的URL,它也应该在位于桌面的输出xls文件中显示"无效url">
下面是我的代码:
from selenium import webdriver
import pandas as pd
import time
import os
c=1
user = os.getlogin()
path = "C:/Users/"+user+"/Desktop/groupid.csv"
path1 = "C:/Users/"+user+"/Desktop/groupid.xlsx"
print(path)
reader = pd.read_csv(path)
driver =webdriver.Chrome('C:/chromedriver.exe')
driver.maximize_window()
reader['groupid'] = ''
for line in reader['URL']:
print(line)
driver.get(line)
if c==1:
time.sleep(60)
time.sleep(5)
groupid = driver.find_element_by_xpath('//*[@id="Xpath"]').text
print(groupid)
reader['groupid'][reader['URL']==line] = groupid
c=c+1
reader.to_excel(path1)
print("extraction Complete")
由于您没有知道哪个错误是在哪个点上引发的,因此很难告诉您应该做什么。
但我假设你面临着硒提出的NoSuchElementException
:
for line in reader['URL']:
print(line)
driver.get(line)
# ...
try:
groupid = driver.find_element_by_xpath(
'//*[@id="Xpath"]'
).text
except NoSuchElementException:
print("Could not find element by xpath. Maybe a bad URL?")
c += 1
# Tell python to go to next element in loop
continue
print(groupid)
# ...
编辑:我对熊猫不是很熟悉。如果你想要一个"无效的URL"列,你能不能使用与"groupid"相同的方法?
reader['invalid_url'] = 'No'
reader['groupid'] = ''
for line in reader['URL']:
try:
driver.get(line)
except WhateverExceptionYouNeedToHandle:
reader['invalid_url'][reader['URL']==line] = 'Yes'
c += 1
continue
...