一段接一段地解析一段文本(使用python Beautiful Soup)

我想要解析的HTML是这样的：

> </td> </tr> <!--MRT--> <tr><td colspan="2" style="border-top: 1px
> Dashed #CCC"><h3>MRT Stations Nearby</h3></td></tr><tr><td
> colspan="2"><table width="602" align="center" cellpadding="0"
> cellspacing="0"><tr><td width="261"><a
> href="/property/propertynearmrt/Boon-Lay-MRT/?t=dl&mid=12" title="Boon
> Lay MRT"><strong>Boon Lay MRT</strong></a><br />Distance :0.07km </td>

从这里，我想得到距离（在这种情况下为0.07公里）。我也在使用以下代码解析车站名称"Boon Lay MRT"：

   soup2=BeautifulSoup(webpage2) 
   for cell in soup2.findAll('h3'):
        if 'MRT Stations Nearby' == cell.text:         
            for cell2 in cell.findAllNext('strong')[0]:
                print(cell2)

如何获取下一段文字（距离）？我原以为只要把（"rong"）[0]改成（"r/"）就行了，但不行。

很抱歉，如果这个问题很傻，我们将不胜感激。

感谢

据我所知，问题的输入是MRT Stations Nearby文本。输出应为0.07km。

在这种情况下，我们的想法是定位MRT Stations Nearby文本，找到tr父级。从那里，找到下一个tr同级，并查找包含Distance文本的元素：

row = soup.find(text="MRT Stations Nearby").find_parent("tr").find_next_sibling("tr")
distance = row.find(text=lambda x: x and x.startswith("Distance"))
print distance.split(":")[-1].strip()

你试过for cell2 in cell.findAllNext('br')[0]:吗。。。我认为你不需要'/'，因为这只是意味着标签是自动关闭的

相关内容

最新更新

热门标签：