如何下载包括JavaScript代码查找结果的网站

如何在linux中下载网站的副本？

我尝试使用wget --recursive --level=inf https://example.com，但是它也从不同域下载了链接。

还可以下载JavaScript运行的网站副本并导致页面上的输出。例如，如果下载天气网站，可能会有JavaScript查找数据库中的当前温度，然后呈现输出。如何捕获温度/最终输出？

phantom.js？

http://phantomjs.org/quick-start.html

我认为这会做您喜欢的！

最好的事情是从这里安装：

http://phantomjs.org/

基本上，您可以通过创建JavaScript脚本并作为命令行Arg进行运行，例如

phantomjs.exe someScript.js

有很多示例，您可以将网站作为图像渲染，例如，您可以做：

phantomjs.exe github.js

github.js看起来像

var page = require('webpage').create();
page.open('http://github.com/', function() {
  page.render('github.png');
  phantom.exit();
});

这个演示在 http://phantomjs.org/screen-capture.html

您还可以将网页内容显示为文本。

例如，让我们以一个简单的网页，demo_page.html：

<html>
    <head>
        <script>
        function setParagraphText() {
            document.getElementById("1").innerHTML = "42 is the answer.";
        }
        </script> 
    </head>
    <body onload="setParagraphText();">
        <p id="1">Static content</p>
    <body>
</html>

然后创建一个测试脚本，test.js：

var page = require('webpage').create();
page.open("demo_page.html", function(status) {
    console.log("Status: " + status);
    if(status === "success") {
        console.log('Page text' + page.plainText);
        console.log('All done');        
    }
phantom.exit();
});

然后在控制台中写下：

> phantomjs.exe test.js
Status: success
Page text: 42 is the answer.
All done

您还可以检查页面DOM甚至更新：

var page = require('webpage').create();
page.open("demo_page.html", function(status) {
    console.log("Status: " + status);
    if(status === "success") {
        page.evaluate(function(){
            document.getElementById("1").innerHTML = "I updated the value myself";
        });
        console.log('Page text: ' + page.plainText);
        console.log('All done');
    }
    phantom.exit();
});

相关内容

最新更新

热门标签：