有人能给我一个具体的例子，说明如何在网上抓取HTML代码的某些部分吗

当前代码：

print(soup.find('table', class_='shop-single-1-table').text.replace(' ', ''))

返回具有类"的表内的所有内容；商店式的1型表"；

这是我试图抓取的网站的HTML代码：

<table class="shop-single-1-table"><tbody><tr><td>Zustand <span class="form-tooltip"><a href="#" data-hystmodal="#frame-info-modal2"><img src="/img/iconsQ2.png" alt="" width="17" height="17"></a></span></td> <th>Sehr gut</th></tr> <tr><td>Schaltgruppe <span class="form-tooltip"></span></td> <th>Shimano Dura Ace</th></tr> <tr><td>Ort</td> <th>Köln, Deutschland</th></tr> <tr><td>Größe <span class="form-tooltip"><a href="#" data-hystmodal="#size-modal2"><img src="/img/iconsQ2.png" alt="" width="17" height="17"></a></span></td> <th>56</th></tr></tbody></table>

什么是更好的方法，例如；Schaltgruppe"；不打印其余的？

注意： 由于缺乏细节，答案只能指向如何在上下文中抓取信息

假设它是你喜欢提取的Schaltgruppe类型，因为提取你已经知道的字符串是没有意义的。

为此，您可以选择包含字符串的<td>：

soup.select_one('table.shop-single-1-table td:-soup-contains("Schaltgruppe") + th').text

或者您将内容转换为dict并通过键Schaltgruppe:获取类型

dict(list(row.stripped_strings) for row in soup.select('table.shop-single-1-table tr')).get('Schaltgruppe')

输出：

Shimano Dura Ace

相关内容

最新更新

热门标签：