我正在尝试构建一个webscraper,但我不确定在哪里可以使用语法。
content = soup.find_all('td', class_ ='serviceListing')
for property in content:
name = property.find('a').text
details = property.find('span').text
pageList = {
'Name' : name,
'Details' : details
}
serviceList.append(pageList)
print(serviceList)
Source Code:
<tr>
<td id = 'xx' class='serviceListing' style='yy: zz;' bgcolor="FFFFFF")
<a id = "content" href="displayService,aspx?id=147674">Location Name</a>
<a href="servicesprovidedinFrench.aspx" class... </a>
<br>
<span id = "content" class='regtext">801 location</span>
</td>
<td valign="top" class='serviceListingCity" ... </td>
<span id="contentplaceholderPhone" class"regtext">905 905 9055 </span>
</td>
<td> ... </td>
我知道我的代码只是获取代码的第一次迭代。返回的每一次内部迭代的正确语法是什么?
目前,我只能在;一个位置名称/a";和"span id 801位置/span"?
如何在serviceListingCity类中获取td和"span 905 905 9055/span"的第二次迭代?
谢谢!
我希望我理解你的问题:你想在<td class="serviceListing">
旁边找到信息吗?如果是,你可以做.find_next()
方法:
from bs4 import BeautifulSoup
html = """
<tr>
<td id="xx" class="serviceListing">
<a id="content" href="displayService,aspx?id=147674">Location Name</a>
<a href="servicesprovidedinFrench.aspx">...</a>
<br>
<span id="content" class="regtext">801 location</span>
</td>
<td valign="top" class="serviceListingCity">
<span id="contentplaceholderPhone" class"regtext">905 905 9055</span>
</td>
<td> ... </td>
</tr>
"""
soup = BeautifulSoup(html, 'html.parser')
for prop in soup.select('td.serviceListing'):
name = prop.find('a').text
details = prop.find('span').text
phone = prop.find_next('span', id='contentplaceholderPhone').text
print(name)
print(details)
print(phone)
打印:
Location Name
801 location
905 905 9055