我正在尝试使用Go和Colly抓取有关Zillow上一些列表的一些细节。下面是我使用的脚本:
package main
import (
"encoding/csv"
"log"
"os"
"time"
"github.com/gocolly/colly"
"github.com/gocolly/colly/proxy"
)
func main() {
// filename for data
fName := "data.csv"
// create a file
file, err := os.Create(fName)
// check for errors
if err != nil {
log.Fatalf("Could not create file, error : %q", err)
return
}
// close file afterwards
defer file.Close()
// instantiate a csv writer
writer := csv.NewWriter(file)
// flush contents afterwards
defer writer.Flush()
// instantiate a collector
c := colly.NewCollector(
colly.AllowedDomains("https://www.zillow.com/austerlitz-ny/sold/"),
)
// point to the webpage structure you need to fetch
c.OnHTML(".list-card-info", func(e *colly.HTMLElement) {
// write the desired data into csv
writer.Write([]string{
e.ChildText("h1"),
e.ChildText("a"),
})
})
// show completion
log.Printf("Scraping Finishedn")
log.Println(c)
}
脚本运行时似乎没有错误,但也没有收集数据。终端记录为"Requests made: 0 (0 responses) | callback: OnRequest: 0, OnHTML: 1, OnResponse: 0, OnError: 0"data.csv也是空的
知道为什么会发生这种情况以及如何解决它吗?
您应该先阅读一个示例。下面是一个示例。只有当使用c.Visit
时,colly才会启动请求并获取用于解析的数据。
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/") // start get data and the OnHTML start parse data get href
}