网页抓取与R -下拉菜单



我正在尝试从这个地址抓取:

http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino?fbclid=IwAR3_hZwajHk_iyU085S1LDTqLCOYLHIZ5K825XgPGcB4tMI0EuCJpQNrJHM

有两个下拉菜单(" origin "one_answers"Destino"。我需要生成一个数据库与所有可能的组合" origin "one_answers"Destino".

在r中的一部分代码下面,我不能在下拉菜单中选择一个选项,所以我可以创建一个循环并提取我需要的数据。

有什么建议吗?

library(RSelenium)  # activate Selenium server
library(rJava)
remDr <- rs_driver_object$client
remDr$open()
remDr$navigate("http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino?fbclid=IwAR3_hZwajHk_iyU085S1LDTqLCOYLHIZ5K825XgPGcB4tMI0EuCJpQNrJHM#")
Origem <- remDr$findElement(using = 'id', 'Origem')
Destino <- remDr$findElement(using = 'id', 'Destino')
botão_pesquisar <- remDr$findElement(using = 'id', 'btnPesquisar')

获取每个组合框中的值(即位置id),有两个数组(from和to),确保也附加标签;该页将调用一个端点,该端点将id作为参数发布—调用看起来像这样:

library(RCurl)
headers = c(
"Accept" = "application/json, text/javascript, */*; q=0.01",
"Accept-Language" = "en-US,en;q=0.9",
"Connection" = "keep-alive",
"Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie" = "__RequestVerificationToken_L1RyYW5zcG9ydGVDb2xldGl2bw2=tY-yKlWmbZvAJzMHmITkohPiIos5XkjDBwf1ZBfP_bYWdXJMBF2Qw3z_B-LRVo0kXjdnHqDqsbZ04Zij_PM-wAf4DWVKfnQskOhqo4ANSRc1",
"Origin" = "http://extranet.artesp.sp.gov.br",
"Referer" = "http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino?fbclid=IwAR3_hZwajHk_iyU085S1LDTqLCOYLHIZ5K825XgPGcB4tMI0EuCJpQNrJHM",
"User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"X-Requested-With" = "XMLHttpRequest"
)
params = "origem=387&destino=388&__RequestVerificationToken=Z-wXmGOb9pnQbmkfcQXmChT-6uc3YfGjftHwK4HnC9SDCaKmzIafo7AI3lChBY6YDBHdpT_X98mSHGAr_YrTNgKiepKxKraGu7p6PI7dV4g1"
res <- postForm("http://extranet.artesp.sp.gov.br/TransporteColetivo/OrigemDestino/GetGrid", .opts=list(postfields = params, httpheader = headers, followlocation = TRUE), style = "httppost")
cat(res)

见origem=和destino=参数?这些是来自静态组合框字段的值,通过简单的web请求很容易做到这一切;每个调用的响应看起来像这样:

[
{
"Codigo": 0,
"Empresa": {
"Codigo": 447,
"Descricao": "VIAÇÃO VALE DO TIETE LTDA",
"FlagCNPJ": false,
"CNPJ": null,
"CPF": null,
"Fretamento": null,
"Escolar": null,
"Municipio": null,
"UF": null,
"Endereco": null,
"Bairro": null,
"CEP": null,
"Telefone": null,
"Email": null
},
"CodigoMunicipioOrigem": 387,
"CodigoMunicipioDestino": 388
}
]

所以当找到一个行程时,你将得到一个数组…不确定这是什么,但入场券我假设;当源和目标没有调度时,数组返回0(空数组)。

相关内容

  • 没有找到相关文章

最新更新