如果链接失败,请重试或跳转到下一个链接



我遇到麻烦了,需要帮助。

我有链接列表(约9000个链接),我正在循环运行,并在每个上做一些过程

链接是这样的:-

link1link2link3link4…..link9000

但我面临的麻烦,因为有时链接2失败(超时),有时link2工作和400或任何随机链接失败作为超时。有没有办法让我再试一次链接失败?我添加了:-

status_c <- httr::GET(Links, config = httr::config(connecttimeout = 150))但我还是得到了暂停。请帮帮我!或者有什么建议吗?Final_links_bind =拥有所有链接列表一些示例链接:-

https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146711
https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146703
https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146789
for(i in 1:nrow(final_links_bind)) {
Links <- final_links_bind[i,]
BP_ID <- final_bp_bind[i,]
#print(Links)
status_c <- GET(Links,timeout(120))
status <- status_code(status_c)
if(status == "200"){
url_parse<- read_html(Links)
col_name<- url_parse %>%
html_nodes("tr") %>%
html_text()
col_name <- stringr::str_remove_all(col_name, "\t|\n|\r")
pattern_col_no <- grep("využití", col_name)
col_name <- as.data.frame(col_name)
method_selected <- col_name[pattern_col_no,]
WRITE_CSV_DATA <- rbind(WRITE_CSV_DATA, data.frame(BP_ID = c(BP_ID), method_selected = c(method_selected), Links = c(Links)))
#METHOD_OF_USE <- rbind(method_selected,METHOD_OF_USE)
print(WRITE_CSV_DATA)

}else{
print("LINK NOT WORKING")
no_Links <- sorted_link[i,]
not_working_link <- rbind(not_working_link,no_Links)
}

}

不清楚您想要怎样的最终输出,但是这里是如何抓取和跳过不起作用的链接

library(rvest)
library(httr2)
library(tidyverse)

给定链接数据帧,注意第三个数据帧不起作用:

df <- tibble(
links = c(
"https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146711",
"https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146703",
"https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/9999999",
"https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146789"
)
)
# A tibble: 4 × 1
links                                                
<chr>                                                
1 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146711
2 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146703
3 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/9999999
4 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146789

创建一个函数来抓取表,特别是第三行:

get_info <- function(link) {
cat("Scraping", link, "n")
link %>%
read_html() %>%
html_table() %>%
pluck(2) %>%
slice(3) %>%
pull(2) 
}

mutate()是包含信息的新列,如果链接不工作则为NA。如果链接不工作,possibly()将抛出NA (NA_character_)返回,而不是停止代码。

df %>% 
mutate(
info = map_chr(links, possibly(get_info, otherwise = NA_character_))
)
# A tibble: 4 × 2
links                                                 info       
<chr>                                                 <chr>      
1 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146711 rodinný dům
2 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146703 rodinný dům
3 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/9999999 NA         
4 https://vdp.cuzk.cz/vdp/ruian/stavebniobjekty/2146789 rodinný dům

最新更新