标签中有多个值的 Python BS4 抓取表<td>



我正试图在标签中有多个类的表上使用BS4。下面的HTML示例。

</tr><tr id="_North_Carolina" class="seedrow">
<td title="Click to show/hide ranks" class='lowrowclick' style="text-align:center;font-size:8px">5</td>
<td  id='North_Carolina' class="teamname"><a href="team.php?team=North+Carolina&year=2019" style="text-decoration: none;">North Carolina<span class="lowrow" style="font-size:10px"><br/>&nbsp;&nbsp;&nbsp;1 seed, <span style='background-color:#BAE2C6'>Sweet Sixteen</span></span></a></td>
<td class="mobileout" style="text-align:center"><a href="conf.php?conf=ACC&year=2019">ACC</a></td>
<td class="6  mobileout" style="text-align:center">33</td>
<td class="5  " style="text-align:center;border-right:solid 1px black"><a title = "<b>Wins:</b> @ Wofford, @ Elon, Stanford, Tennessee Tech, St. Francis PA, v. UCLA, UNC Wilmington, Gonzaga, Davidson, Harvard, @ Pittsburgh, @ North Carolina St., Notre Dame, @ Miami FL, Virginia Tech, @ Georgia Tech, @ Louisville, North Carolina St., Miami FL, @ Wake Forest, @ Duke, Florida St., Syracuse, @ Clemson, @ Boston College, Duke, v. Louisville, <br/><b>Losses</b>: v. Texas, @ Michigan, v. Kentucky, Louisville, Virginia, v. Duke, " href='results.php?team=North+Carolina&begin=20181101&end=20190501&conlimit=All&lastx=0&year=2019&top=0&venue=All&type=R&mingames=0&quad=5&rpi=&f=1'">27–6</a><br/><span class="lowrow" style="font-size:8px;">16–2</span></td>
<td class="1  " style="background-color:#AADBB9">119.2<br/><span class="lowrow" style="font-size:8px;">8</span></td>
<td class="2  " style="background-color:#ACDCBA">91.2<br/><span class="lowrow" style="font-size:8px;">10</span></td>
<td  class="3  " style="background-color:#A8DAB6; border-right:solid 1px black" >.9559<br/><span class="lowrow" style="font-size:8px;">5</span></td>
<td style="background-color:#E8F4ED" class="7  mobileout" >52.9<br/><span class="lowrow" style="font-size:8px;">78</span></td>
<td style="background-color:#DAEEE1;border-right:solid 1px black" class="8  mobileout" style="border-right:solid 1px black">48.3<br/><span class="lowrow" style="font-size:8px;">62</span></td>
<td style="background-color:#E4F3EA" class="11 mobileout" >17.1<br/><span class="lowrow" style="font-size:8px;">74</span></td>
<td style="background-color:#f9fbff;border-right:solid 1px black" class="12 mobileout" style="border-right:solid 1px black">18.5<br/><span class="lowrow" style="font-size:8px;">166</span></td>
<td style="background-color:#B6E0C2" class="13 mobileout" >34.6<br/><span class="lowrow" style="font-size:8px;">21</span></td>
<td style="background-color:#AEDDBC;border-right:solid 1px black" class="14 mobileout" style="border-right:solid 1px black">23.2<br/><span class="lowrow"  style="font-size:8px;">12</span></td>
<td style="background-color:#f9fbff" class="9  mobileout" >30.9<br/><span class="lowrow" style="font-size:8px;">241</span></td>
<td style="background-color:#E2F2E9;border-right:solid 1px black" class="10 mobileout" style="border-right:solid 1px black">28.9<br/><span class="lowrow" style="font-size:8px;">72</span></td>
<td style="background-color:#F6FAFA" class="16 mobileout" >51.9<br/><span class="lowrow" style="font-size:8px;">95</span></td>
<td style="background-color:#DDF0E4;border-right:solid 1px black" class="17 mobileout" style="border-right:solid 1px black">47.5<br/><span class="lowrow" style="font-size:8px;">66</span></td>
<td style="background-color:#E0F1E7" class="18 mobileout" >36.5<br/><span class="lowrow" style="font-size:8px;">69</span></td>
<td style="background-color:#EBF5F0;border-right:solid 1px black" class="19 mobileout" style="border-right:solid 1px black">32.9<br/><span class="lowrow" style="font-size:8px;">82</span></td>
<td style="background-color:#A7DAB6;;border-right:solid 1px black"" class="26 mobileout" >76.3<br/><span class="lowrow" style="font-size:8px;">4</span></td>
<td style="background-color:#A9DBB8" class="34 " >10<br/><span class="lowrow" style="font-size:8px;">4</span></td>

我的目标是从团队记录(td class 5(开始,并将其返回为:

North_Carolina, 30-3, 16-0

我当前的代码

data = soup.findAll('tr', class_ = 'seedrow')
for item in data:
records = item.find('td', class_ = '5')
for first in records:
reg_record = first.find('a')
print(reg_record)

它只返回一个"无"列表。如有任何帮助,我们将不胜感激。

您在href的锚标记中发布的html有问题,您没有关闭单引号。

就像这个CCD_ 1。我已将其修改为<a title = "Wins" href='results.php?'>

这是我测试的代码

from bs4 import BeautifulSoup
strrr = """<tr id="_North_Carolina" class="seedrow"><td class="5" style="text-align:center;border-right:solid 1px black"><a title = "Wins" href='results.php?'>30–3</a> <br/><span class="lowrow" style="font-size:8px;">16–0</span></td></tr>
"""
soup = BeautifulSoup(strrr, 'html.parser')
data = soup.findAll('tr', attrs={"class": "seedrow"})
for item in data:
records = item.findAll('td', attrs={"class": "5"})
for first in records:
reg_record = first.find('a').contents[0]
print(reg_record)

输出为30-3

有问题的标记无效(我认为这是因为复制粘贴错误(。最好是共享URL以获得完整HTML:的视图

from bs4 import BeautifulSoup

html_doc = '''
<tr id="_North_Carolina" class="seedrow">
<td class="someclass5" style="text-align:center;border-right:solid 1px black">
<a title="Wins" href="results.php?">30–3</a> 
<br/>
<span class="lowrow" style="font-size:8px;">16–0</span>
</td>
</tr>'''
soup = BeautifulSoup(html_doc, 'html.parser')
name = soup.find('tr')['id'].strip('_')
d1, d2 = soup.find('tr').get_text(strip=True, separator=' ').split()
print(name, d1, d2)

打印:

North_Carolina 30–3 16–0

最新更新