我是新手,正在尝试利用Go中的并发构建基本刮板以从URL中摘取提取标题,元描述和元关键字。
我能够用并发将结果打印到终端,但无法弄清楚如何将输出写入CSV。我尝试了许多变体,我可以通过有限的GO知识来想到这些变体,许多人最终打破了并发性 - 因此失去了一点点。
我的代码和URL输入文件如下 - 预先感谢您的任何提示!
// file name: metascraper.go
package main
import (
// import standard libraries
"encoding/csv"
"fmt"
"io"
"log"
"os"
"time"
// import third party libraries
"github.com/PuerkitoBio/goquery"
)
func csvParsing() {
file, err := os.Open("data/sample.csv")
checkError("Cannot open file ", err)
if err != nil {
// err is printable
// elements passed are separated by space automatically
fmt.Println("Error:", err)
return
}
// automatically call Close() at the end of current method
defer file.Close()
//
reader := csv.NewReader(file)
// options are available at:
// http://golang.org/src/pkg/encoding/csv/reader.go?s=3213:3671#L94
reader.Comma = ';'
lineCount := 0
fileWrite, err := os.Create("data/result.csv")
checkError("Cannot create file", err)
defer fileWrite.Close()
writer := csv.NewWriter(fileWrite)
defer writer.Flush()
for {
// read just one record
record, err := reader.Read()
// end-of-file is fitted into err
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return
}
go func(url string) {
// fmt.Println(msg)
doc, err := goquery.NewDocument(url)
if err != nil {
checkError("No URL", err)
}
metaDescription := make(chan string, 1)
pageTitle := make(chan string, 1)
go func() {
// time.Sleep(time.Second * 2)
// use CSS selector found with the browser inspector
// for each, use index and item
pageTitle <- doc.Find("title").Contents().Text()
doc.Find("meta").Each(func(index int, item *goquery.Selection) {
if item.AttrOr("name", "") == "description" {
metaDescription <- item.AttrOr("content", "")
}
})
}()
select {
case res := <-metaDescription:
resTitle := <-pageTitle
fmt.Println(res)
fmt.Println(resTitle)
// Have been trying to output to CSV here but it's not working
// writer.Write([]string{url, resTitle, res})
// err := writer.WriteString(`res`)
// checkError("Cannot write to file", err)
case <-time.After(time.Second * 2):
fmt.Println("timeout 2")
}
}(record[0])
fmt.Println()
lineCount++
}
}
func main() {
csvParsing()
//Code is to make sure there is a pause before program finishes so we can see output
var input string
fmt.Scanln(&input)
}
func checkError(message string, err error) {
if err != nil {
log.Fatal(message, err)
}
}
带有URL的数据/sample.csv输入文件:
http://jonathanmh.com
http://keshavmalani.com
http://google.com
http://bing.com
http://facebook.com
在您提供的代码中,您已经评论了以下代码:
// Have been trying to output to CSV here but it's not working
err = writer.Write([]string{url, resTitle, res})
checkError("Cannot write to file", err)
此代码是正确的,除了您有一个问题。在功能的早期,您有以下代码:
fileWrite, err := os.Create("data/result.csv")
checkError("Cannot create file", err)
defer fileWrite.Close()
此代码使文件作者一旦您的csvParsing()
func退出。因为您已经关闭了延期文件作者,所以您无法在并发函数中写入它。
解决方案:您需要在并发的func 或类似的内容内使用defer fileWrite.Close()
,因此您在写信之前就不会关闭文件作者。