r语言 - 尝试登录网页时返回404警告



我正在尝试登录到一个网站,需要一个用户名和密码使用投资。

我使用这个作为资源,因为我发现它非常有用:https://awesomeopensource.com/project/yusuzech/r-web-scraping-cheat-sheet#rvest7.5

当我提交登录表单时,我收到一个HTTP 404警告消息,无法继续阅读网页上的任何html。

Submitting with 'NULL'
Warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode,  :
Not Found (HTTP 404).

谁能理解HTML,请帮助我了解,如果我在我的提交表单传递正确的字段?

我的代码如下:

install.packages("pacman")
# LOAD LIBRARIES
pacman::p_load(rvest,purrr,xml2,dplyr,stringr)
# TARGET URL
url <- "https://www.mywebsite.com/"
# SPOOF THE USER AGENT TO LOOK LIKE A BROWSER
ua <- httr::user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36")
# CREATE A PERSISTANT SESSION
my_session <- rvest::html_session(url,ua)
# FIND ALL FORMS IN THE WEB PAGE
unfilled_forms <- rvest::html_form(my_session)
# SELECT THE FORM THAT YOU NEED TO FILL IN
login_form <- unfilled_forms[[1]]
#FILL IN THE FORM
filled_form <- set_values(login_form, username = "myUsername", password = "myPassword")
# SUBMIT THE FORM TO LOGIN
login_session <- submit_form(my_session, filled_form)

我决定改变方向,使用Rselenium,这花了几个小时来掌握窍门,但我还是去了。

Rselenium在需要登录时非常有用,我希望几个月前我在另一个项目中知道这个。

library(RSelenium)
# https://stackoverflow.com/questions/55201226/session-not-created-this-version-of-chromedriver-only-supports-chrome-version-7/56173984
rd <- rsDriver(browser = "chrome",
chromever = "88.0.4324.27",
port = netstat::free_port())
remdr <- rd[["client"]]
url <- "https://www.mywebsite.com/"  # url of the site's login page
remdr$navigate(url)  # Navigating to the page
Sys.sleep(10)
loginbutton <- remdr$findElement(using = 'css selector','.plain')
loginbutton$clickElement()
username <- remdr$findElement(using = 'css selector','#username')
password <- remdr$findElement(using = 'css selector','#password')
login <- remdr$findElement(using = 'css selector','#btnLoginSubmit1')
username$sendKeysToElement(list("myUserName"))
password$sendKeysToElement(list("myPassword"))
login$clickElement()

最新更新