无法将Netnut.io代理与Apify Cheerio scraper一起使用



我开发了web scraper,我想将Netnut的Proxy集成到其中。

Netnut集成给出:

代理URL:gw.ntnt.io代理端口:5959代理用户:igorsavinkin抄送任何代理密码:xxxxx

旋转IP格式示例(IP:PORT:USERNAME-CC-COUNTRY:PASSWORD(:gw.ntnt.io:5959:igorsavinkin抄送任何:xxxxx

为了更改国家/地区,请将"any"更改为您想要的国(美国、英国、IT、DE等(可用国家/地区:https://l.netnut.io/countries

我们的IP是自动旋转的,如果你想让它们成为静态的住宅,请在用户名参数中添加会话ID,如下面的例子:

用户名-cc-any-sid-any_number

代码:

Apify.main(async () => { 
const proxyConfiguration = await Apify.createProxyConfiguration({
proxyUrls: [ 
'gw.ntnt.io:5959:igorsavinkin-DE:xxxxx'
]
});
// Add URLs to a RequestList
const requestQueue = await Apify.openRequestQueue(queue_name);
await requestQueue.addRequest({ url: 'https://ip.nf/me.txt' });

// Create an instance of the CheerioCrawler class - a crawler
// that automatically loads the URLs and parses their HTML using the cheerio library.
const crawler = new Apify.CheerioCrawler({ 
// Let the crawler fetch URLs from our list.
requestQueue,
// To use the proxy IP session rotation logic, you must turn the proxy usage on.
proxyConfiguration,
// Activates the Session pool.         
minConcurrency: 10,
maxConcurrency: 50,
// On error, retry each page at most once.
maxRequestRetries: 2,
// Increase the timeout for processing of each page.
handlePageTimeoutSecs: 50,
// Limit to 10 requests per one crawl
maxRequestsPerCrawl: 1000,
handlePageFunction: async ({ request, $/*, session*/ }) => {
const text = $('body').text();
log.info(text);
...
});
await crawler.run();
});

错误:RequestError:getaddrinfo ENOTFOUND 5959 5959:80

看起来爬虫混合了网址端口5959和80…

ERROR CheerioCrawler: handleRequestFunction failed, reclaiming failed request
back to the list or queue {"url":"https://ip.nf/me.txt","retryCount":3,"id":
"F32s4Txz0fBUmwd"}
RequestError: getaddrinfo ENOTFOUND 5959 5959:80
at ClientRequest.request.once (C:UsersUserDocumentsRnDNode.jsmerc
ateo-scrapernode_modulesgotdistsourcecoreindex.js:953:111)
at Object.onceWrapper (events.js:285:13)
at ClientRequest.emit (events.js:202:15)
at ClientRequest.origin.emit.args (C:UsersUserDocumentsRnDNode.js
mercateo-scrapernode_modules@szmarczakhttp-timerdistsourceindex.js:39:2
0)
at onerror (C:UsersUserDocumentsRnDNode.jsmercateo-scrapernode_m
odulesagent-basedistsrcindex.js:115:21)
at callbackError (C:UsersUserDocumentsRnDNode.jsmercateo-scraper
node_modulesagent-basedistsrcindex.js:134:17)
at processTicksAndRejections (internal/process/next_tick.js:81:5)

有办法吗

尝试使用以下格式:

http://username:password@主机:端口

最新更新