用python用beautifulsoup和selenium抓取网页


install chromium, its driver, and selenium 
!apt update 
!apt install chromium-chromedriver 
!pip install selenium 
# set options to be headless, .. 
from selenium import webdriver 
options = webdriver.ChromeOptions() 
options.add_argument('--headless') 
options.add_argument('--no-sandbox') 
options.add_argument('--disable-dev-shm-usage') 
!sudo add-apt-repository ppa:saiarcot895/chromium-beta 
!sudo apt remove chromium-browser 
!sudo snap remove chromium 
!sudo apt install chromium-browser 
!pip3 install selenium 
!apt-get update 
!apt install chromium-chromedriver 
!cp /usr/lib/chromium-browser/chromedriver /usr/bin 
import sys 
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver') 
wd = webdriver.Chrome(options=options) 
wd.get("https://www.rusprofile.ru/?loggedout") 
from selenium.webdriver.support.select import Select 
from selenium.webdriver.common.by import By 
import time 
from time import sleep 
from selenium.webdriver.remote.webelement import WebElement

在检查个人帐户(selenium, python, colab google)的凭据之前,代码正常工作,没有错误

我正试图用下面的代码登录网站,一切都很好,直到我试图检查登录是否成功

我按下按钮调用弹出的登录表单:

l = wd.find_element(By.XPATH, '//*[@type="button"]') 
l.click()

然后我找到电子邮件和密码字段并填写它们:

email=wd.find_element(By.XPATH, '//*[@type="email"]') # get the username field 
password=wd.find_element(By.XPATH, '//*[@type="password"]') # get the password field
email.send_keys("my-email") 
password.send_keys("my-password")

尝试提交表单:

wait = WebDriverWait(wd, 10) 
element_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@class="btn btn-blue"]'))) 
element_button.click()

,然后检查登录是否成功,通过检查HTML是否有我的名字和姓氏:

my_element = wd.find_element(By.XPATH, "//span[text()='Myfirstname Mysurname']") 
my_element

它每隔一次工作,我不知道是否有任何错误之前的最后一步或它是网站保护。请帮助我找到一个解决方案,我可以每次登录没有错误

我建议使用您在这里使用的等待:

wait = WebDriverWait(wd, 10) 
element_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@class="btn btn-blue"]')))

在您执行的所有其他操作期间,例如:

element_button = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Myfirstname Mysurname']"))) 

如果您不寻找按钮,请记住将值从element_to_be_clickable更改为visibility_of_element_located。您应该在脚本/测试的每一步都使用等待,以确保您想要使用的元素确实存在。

我尝试了您的代码,并工作良好,直到检查您的用户的名称。试着找到完整的xpath,对我来说很好:

my_element = wd.find_element(By.XPATH, "/html/body/div[2]/header/div/div[2]/div[2]/a[1]/span") 
my_element.text

然后你就得到了用户的纯文本 编辑:

wd = webdriver.Chrome(PATH)
wd.maximize_window()
wd.get("https://www.rusprofile.ru/?loggedout") 
l = wd.find_element(By.XPATH, '//*[@type="button"]') 
l.click()
email=wd.find_element(By.XPATH, '//*[@type="email"]') # get the 
username field 
password=wd.find_element(By.XPATH, '//*[@type="password"]') # get the 
password field
email.send_keys("pep@yopmail.com") 
password.send_keys("123456Aa!")
wait = WebDriverWait(wd, 10) 
element_button = wait.until(EC.element_to_be_clickable((By.XPATH,'//[@class="btn btn-blue"]'))) 
element_button.click()
time.sleep(2)
try:
my_element = wd.find_element(By.XPATH, "/html/body/div[2]/header/div/div[2]/div[2]/a[1]/span") 
my_element.text
except:

l = wd.find_element(By.XPATH, '/html/body/div[6]/div/div/div/div/div[6]') 
l.click()

time.sleep(2)
my_element = wd.find_element(By.XPATH, "/html/body/div[2]/header/div/div[2]/div[2]/a[1]/span") 
my_element.text

这个可以跳过弹出窗口。如果有一个弹出跳过它,它不显示获取用户名。希望能有所帮助

相关内容

  • 没有找到相关文章

最新更新