Scrapper colly在无头模式下?

无头模式?

你好,

我是高朗的新手，我要为我在法国的学校做一个刮刀。

我要刮的网站是www.allrecipes.com。在这个网站上，我选择了这个页面https://www.allrecipes.com/recipes/17562/dinner/

在这个网站上，我必须得到一些食谱，更准确地说:标题,网址,成分,步骤,描述。

我看到网站www.allrecipes.com是用vue.js制作的，当我想获得url时，我不能。

在代码中，我使用colly。我们能不能用colly怎么说?"chromedp">

package main
import (
"encoding/json"
"fmt"
"os"
"github.com/gocolly/colly"
)
type products struct {
Name string `json:"name"`
URL  string `json:"url"`
}
var allProducts []products
func main() {
c := colly.NewCollector(
colly.AllowedDomains("www.allrecipes.com"),
)
c.OnRequest(func(r *colly.Request) {
fmt.Println("Scraping:", r.URL)
})
c.OnResponse(func(r *colly.Response) {
fmt.Println("Status:", r.StatusCode)
})
// OnHTML enregistre une fonction. La fonction sera exécutée sur chaque HTML élément correspondant au paramètre
c.OnHTML("a.mntl-card", func(h *colly.HTMLElement) {
products := products{
URL:  h.ChildAttr("a.mntl-card-list-items", "href"),
Name: h.ChildText(".card__title-text"),
}
fmt.Println(products)
allProducts = append(allProducts, products)
})
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "failed with response:", r, "nError:", err)
})
c.Visit("https://www.allrecipes.com/recipes/17562/dinner/")
content, err := json.Marshal(allProducts)
if err != nil {
fmt.Println(err.Error())
}
os.WriteFile("data.json", content, 0644)
fmt.Println("Total produts: ", len(allProducts))
}

似乎这是使它工作所需的唯一更改:

products := products{
-   URL:  h.ChildAttr("a.mntl-card-list-items", "href"),
+   URL:  h.Attr("href"),
Name: h.ChildText(".card__title-text"),
}

请注意，a.mntl-card-list-items与该页中的a.mntl-card是相同的元素。

指出:

Colly不涉及浏览器，所以它与"headless"无关。模式。
似乎页面没有使用vue.js和html响应已经有你需要的一切。在这种情况下，Colly是完美的选择。
chromedp驱动一个真正的浏览器，它比Colly重。如果Colly能胜任，你就不需要它了。

相关内容

最新更新

热门标签：