我有这个代码:
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/synthfont-159403.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
for a in soup.select('div.Updated.coLeft ul a[href]'):
print a.string
但它给了我这样的输出:
Kenneth Rundt
我需要的是更新后的coleft类中的全部信息。我该怎么办?
获取li
元素:
>>> for li in soup.select('div.Updated.coLeft li'):
... print ' '.join(li.stripped_strings)
...
Last Updated: Dec 27, 2012
License: Freeware Free
OS: Windows 7/Vista/XP
Requirements: No special requirements
Publisher: Kenneth Rundt (4 more Applications)