美丽汤4,我怎么抢,<td>它是类内<tr>?



我正在尝试构建一个webscraper,但我不确定在哪里可以使用语法。

content = soup.find_all('td', class_ ='serviceListing')


for property in content:
name = property.find('a').text 
details = property.find('span').text

pageList = {
'Name' : name,
'Details' : details
}
serviceList.append(pageList)

print(serviceList)

Source Code:
<tr>
<td id = 'xx' class='serviceListing' style='yy: zz;' bgcolor="FFFFFF")
<a id = "content" href="displayService,aspx?id=147674">Location Name</a>
&nbsp;
<a href="servicesprovidedinFrench.aspx" class... </a>
<br>
<span id = "content" class='regtext">801 location</span>
</td>
<td valign="top" class='serviceListingCity" ... </td>
<span id="contentplaceholderPhone" class"regtext">905 905 9055 </span>
</td>
<td> ... </td>

我知道我的代码只是获取代码的第一次迭代。返回的每一次内部迭代的正确语法是什么?

目前,我只能在;一个位置名称/a";和"span id 801位置/span"?

如何在serviceListingCity类中获取td和"span 905 905 9055/span"的第二次迭代?

谢谢!

我希望我理解你的问题:你想在<td class="serviceListing">旁边找到信息吗?如果是,你可以做.find_next()方法:

from bs4 import BeautifulSoup
html = """
<tr>
<td id="xx" class="serviceListing">
<a id="content" href="displayService,aspx?id=147674">Location Name</a>
&nbsp;
<a href="servicesprovidedinFrench.aspx">...</a>
<br>
<span id="content" class="regtext">801 location</span>
</td>
<td valign="top" class="serviceListingCity">
<span id="contentplaceholderPhone" class"regtext">905 905 9055</span>
</td>
<td> ... </td>
</tr>
"""
soup = BeautifulSoup(html, 'html.parser')
for prop in soup.select('td.serviceListing'):
name = prop.find('a').text
details = prop.find('span').text
phone = prop.find_next('span', id='contentplaceholderPhone').text
print(name)
print(details)
print(phone)

打印:

Location Name
801 location
905 905 9055

最新更新