Python-遍历URL，查找文本，写入新列表

我试图浏览一个url列表，在html文本中找到一些内容，并将其写入一个新列表。我遇到的问题是，虽然我有一个for循环，但它只输出最后一个url(列表"url"中有500个(。我不知道如何让它迭代写，然后进行下一次迭代，而不是迭代，然后只写列表中的最后一个。有什么想法可以让它发挥作用吗？

for url in urls:
try:
page = urlopen(url)
except:
print("Error opening the URL")    
soup = BeautifulSoup(page, 'html.parser')
content = soup.find('div', {"class": "sp-m-box-section"})
article = []

for url in urls:
article = article.append(content)   #here I am completely unsure how to handle it
print(article)

谢谢你的任何想法。

这能解决您的问题吗？

article = []
for url in urls:
try:
page = urlopen(url)
except:
print("Error opening the URL")    
soup = BeautifulSoup(page, 'html.parser')
content = soup.find('div', {"class": "sp-m-box-section"})       
article.append(content)
print(article)

这里很少有问题。

您在每次迭代后通过声明article=[]来覆盖article列表。所以它总是有一个空列表，即使当你追加时也是如此。在最后一次迭代之后，它不会创建article=[]，只剩下它附加的最后一个东西
为什么要遍历url两次
我更改了它以不同的方式处理try/except

基本上，试着阅读页面。如果没有，错误会被引发并继续到下一个url(如果它不能读取html，那么处理它就没有意义…此外，你也会在那里得到一个错误(

试试看：

article = []
for url in urls:
try:
page = urlopen(url)
except:
print("Error opening the URL") 
continue
soup = BeautifulSoup(page, 'html.parser')
content = soup.find('div', {"class": "sp-m-box-section"})
article.append(content.text) # <- here I'm assuming you want the actual text/content, not the html  
print(article)

相关内容

最新更新

热门标签：