我有一个.csv的URL文件需要验证。
我想将 HTTR 的 GET 应用于数据帧的每一行。
> websites
website
1 www.msn.com
2 www.wazl.com
3 www.amazon.com
4 www.rifapro.com
我确实发现了类似的问题,并试图应用提供的答案; 但是不起作用。
> apply(websites, 1, transform, result=GET(websites$website))
Error: length(url) == 1 is not TRUE
> apply(websites, websites[,1], GET())
Error in handle_url(handle, url, ...) :
Must specify at least one of url or handle
我不确定我做错了什么。
你可以
做类似的事情
websites <- read.table(header=T, text="website
1 www.msn.com
2 www.wazl.com
3 www.amazon.com
4 www.rifapro.com")
library(httr)
urls <- paste0(ifelse(grepl("^https?://", websites$website, ig=T), "", "http://"),
websites$website)
lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
names(lst) <- urls
sapply(lst, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
# http://www.msn.com http://www.wazl.com http://www.amazon.com http://www.rifapro.com
# 200 -999 405 -999
恕我直言,无需GET
请求。
@LukeA给了我答案,我只是将其更改为以下内容以生成数据帧而不是列表。谢谢卢克·
urls <- paste0(ifelse(grepl("^https?://", websitm$WEBSITE, ig=T), "", "http://"),
websitm$WEBSITE )
lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
a<- list(lst,urls)
b<- as.data.frame(sapply(a, rbind))
b$outcome<- sapply(b$V1, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
细化上述代码后:
website<- read.csv(file= "path")
website<- website[!duplicated(website$Website),]
websitm<- website
websitm$Website <- paste0(ifelse(grepl("^(https?://)?www.",websitm[, 2], ig=T), "", "http://www."),websitm[, 2])
websitm$Website <- paste0(ifelse(grepl("^https?://",websitm[, 2], ig=T), "", "http://"),websitm[, 2])
Httpcode<- function(x){try(HEAD(x, timeout(seconds = 20), silent = T))}
websitm$error<- apply(websitm[,2, drop=F], 1, Httpcode)
websitm$outcome<- sapply(websitm$error, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
websitm<- data.frame(lapply(websitm, as.character), stringsAsFactors=FALSE)