在collly网页抓取中从相同的类名中获取值



我正在使用go语言和colly web抓取框架开发小型web抓取应用程序

这里是网站的HTML代码

<div clas="cc">  
<div class="list">
<span class="countrybg" style="background-image: url(countryimage);"></span>
<span class="continet">Asia</span>
<span class="country">india</span>
</div>
<div class="list">
<span class="countrybg" style="background-image: url(countryimage);"></span>
<span class="continet">Africa</span>
<span class="country">Brazil</span>
</div>
</div>   

现在我要逐个获取所有三个span元素并将其附加到数组

我尝试了这个代码,但它不工作,但它返回作为AsiaAfrica
,但我想要单独的值,并希望获取的图像url的countrybg类

c := make([]string, 10) 
element.ForEach(".list span", func(_ int, elem *colly.HTMLElement) {
result := element.ChildText("span:nth-child(2)")
c = append(c, result)
})
示例输出应该像
countrybg = ['image1url' ,'image2url']
continet = ['Asia' ,'Africa']
country = ['india' ,'Brazil']

有谁能帮我弄明白这个吗

我在端口8081上运行了一个本地服务器,并尝试获得您正在寻找的值。有很多方法可以做你需要做的事情,这只是其中之一:

package main
import (
"fmt"
"regexp"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
countrybgs := []string{}
continents := []string{}
countries := []string{}
r := regexp.MustCompile(`background-image: url((.*));`)
/*
<div clas="cc">
<div class="list">
<span class="countrybg" style="background-image: url(image1url);"></span>
<span class="continet">Asia</span>
<span class="country">india</span>
</div>
<div class="list">
<span class="countrybg" style="background-image: url(image2url);"></span>
<span class="continet">Africa</span>
<span class="country">Brazil</span>
</div>
</div>
*/
c.OnHTML("span", func(e *colly.HTMLElement) {
switch class := e.Attr("class"); class {
case "countrybg":
countrybgs = append(countrybgs, r.FindStringSubmatch(e.Attr("style"))[1])
case "continet":
continents = append(continents, e.Text)
case "country":
countries = append(countries, e.Text)
}
})
c.Visit("http://localhost:8081")
fmt.Println(countrybgs)
fmt.Println(continents)
fmt.Println(countries)
}

输出:

> go run .
[image1url image2url]
[Asia Africa]
[india Brazil]

相关内容

  • 没有找到相关文章

最新更新