如何获取<li>标签信息(美丽汤网页抓取)?



我正在从该页面抓取信息:
https://lawyers.justia.com/lawyer/michael-paul-ehline-85006。我正设法把所有的资料都放在收费栏下面。我想要的是以下信息:免费咨询对接受信用卡Visa、万事达、美国运通或有费用仅在人身伤害的情况下。费率、固定器和附加信息费率因具体情况而异

这就是我尝试过的:

for thing in soup.findAll('ul', attrs={"class": "has-no-list-styles"}):
ul=thing.find('<li>')
print(ul)

但是输出是:

<li>Intellectual Property</li>
<li>Copyright Law</li>
<li><strong>English</strong></li>

提前谢谢。

更新:我找到了一个解决方案,但它给了我无限的循环,有什么建议吗?

for o in soup.findAll('div', attrs={"class": "block-wrapper"}):     
for tag in soup.findAll('div', attrs={"class": "block-wrapper"}):
if tag.string:
tag.string.replace_with("")
for de in o.findAll("li"):
if de != []:
de=remove_tags(str(de))
print (de)

试试这个。

from simplified_scrapy import SimplifiedDoc,req
html = req.get('https://lawyers.justia.com/lawyer/michael-paul-ehline-85006')
doc = SimplifiedDoc(html)
ul = doc.getElement('ul',attr='class',value='has-no-list-styles',start='class="jicon -large jicon-fee"') # Use class="jicon -large jicon-fee" to locate
print (ul.text)

结果:

Free ConsultationYesCredit Cards AcceptedVisa, Mastercard, American ExpressContingent FeesIn personal injury cases only.Rates, Retainers and Additional InformationRates vary on a case by case basis.

试试这道汤。它的灵感来自达宾索斯的回答。它所做的就是寻找他详细描述的图标,然后转到其父母的下一个兄弟姐妹,然后从那里抓取兄弟姐妹的文本。

import requests 
from bs4 import BeautifulSoup 
URL = "https://lawyers.justia.com/lawyer/michael-paul-ehline-85006"
r = requests.get(URL) 
soup = BeautifulSoup(r.content, 'html.parser')
uls = soup.find('span', attrs={"class": "jicon -large jicon-fee"})
print(uls.parent.nextSibling.text)

调整你的刮削以满足这一点,看看这是否有帮助!

最新更新