Phantomjs加载不同于chrome的页面

我正试图从以下链接下载html：

http://matchhistory.na.leagueoflegends.com/en/#match-详细信息/TRLH3/100043019？gameHash=a5e39c76a8e91ba9&tab=统计

当我在chrome中打开时，它会将匹配的所有数据加载到html中。我想在phantomjs中打开这些页面，但它们加载的内容不一样？我使用以下代码来截图phantomjs加载的内容。这只是主比赛历史页面：http://matchhistory.na.leagueoflegends.com/en

var page = require('webpage').create();
var url="http://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH3/1000430019?gameHash=a5e39c76a8e91ba9&tab=stats";
console.log('The default user agent is ' + page.settings.userAgent);
page.settings.userAgent = 'SpecialAgent';
page.open(url, function(status) {
if (status !== 'success') {
console.log('Unable to access network');
}
setTimeout(function (){page.render('mh.png');},1000);
setTimeout(function (){phantom.exit();},1200);
});

我不知道他们为什么渲染两种不同的东西。如何让pahntomjs渲染相同的东西？

提前感谢

正如@andrew-lohr所指出的，之所以会发生这种情况，是因为PhantomJS在处理重定向时会丢弃片段。提出了一个问题(https://github.com/ariya/phantomjs/issues/12192)并创建拉取请求以修复(https://github.com/ariya/phantomjs/pull/14941)但由于PhantomJS已经暂停开发，这些还没有发布(https://github.com/ariya/phantomjs/issues/15344)。

另一种选择是使用Puppeteer(https://github.com/GoogleChrome/puppeteer)其具有关于如何捕获屏幕截图的使用示例。

在您的情况下，这可以像安装Puppeteer:一样简单

npm install puppeteer

然后将您的代码更新为：

const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH3/1000430019?gameHash=a5e39c76a8e91ba9&tab=stats');
await page.screenshot({path: 'mh.png'});
await browser.close();
})();

并通过node而不是phantomjs:运行代码

node <filename>.js

Puppeteer站点有更多关于可以配置的信息(查看端口等(。

您的http链接可能被重定向到https。我的猜测是，phantom.js在重定向时没有保留片段标识符(#match-details(或它之后的任何东西，这就是为什么你会得到主页http://matchhistory.na.leagueoflegends.com/en

要解决您的问题，请使用带有https的链接，这将起作用，因为您不会被重定向。

var url="https://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH3/1000430019?gameHash=a5e39c76a8e91ba9&tab=stats";

相关内容

最新更新

热门标签：