小贝子编程

python 2.7 -如何获得大量的网站与Beautifulsoup非常大的内容的链接

本文关键字：Beautifulsoup 非常链接网站何获得 python python-2.7 beautifulsoup urllib
更新时间 : 2023-08-22
英文 : python 2.7 - How to get large amounts of href links of very large contents of website with Beautifulsoup

我正在解析一个有超过1000个href链接的大型html网站。我使用Beautifulsoup获得所有链接，但第二次当我再次运行程序时，Beautifulsoup无法处理它。(查找特定的所有"td"标签。我该如何克服这个问题?虽然我可以用urllib加载html页面，但无法打印所有链接。当我将它与查找一个"td"标签一起使用时，它被传递。

Tag = self.__Page.find('table', {'class':'RSLTS'}).findAll('td')    
    print Tag           
    for a in Tag.find('a', href= True):
        print "found", a['href']

现在工作为

标签= self.__Page。找到("表",{"类":"RSLTS"});(td)
打印标签

    for a in Tag.find('a', href= True):
        print "found", a['href']

您需要遍历它们:

tds = self.__Page.find('table', class_='RSLTS').find_all('td')
for td in tds:
    a = td.find('a', href=True)
    if a:
        print "found", a['href']

如果你有大量的东西，我会使用lxml:

root.xpath('table[contains(@class, "RSLTS")]/td/a/@href')

python 2.7 -如何获得大量的网站与Beautifulsoup非常大的内容的链接

相关内容

最新更新

热门标签：