我遇到了BeautifulSoup没有完全解析收到的html的问题。我尝试使用lxml和html5lib解析器,我遇到了同样的问题。
html = '<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>t</td><td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td> <td class="ShotsTotal ">0t</td><td class="ShotOnTarget ">0t</td><td class="KeyPassTotal ">0t</td><td class="PassSuccessInMatch ">88t</td><td class="DuelAerialWon ">0t</td><td class="Touches ">35t</td><td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td> '
parsed_html = ipdb> BeautifulSoup(html, 'html5lib')
<html><head></head><body>1 <span class="ui-icon country flg-fr"></span> <a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, GK </span> 0 0 0 88 0 35 6.24 <span class="incident-wrapper"></span> </body></html>
它对我有用。我执行以下代码(使用 beautifulsoup4==4.4.1
):
from bs4 import BeautifulSoup
html = """
<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>t</td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td>
<td class="ShotsTotal ">0t</td>
<td class="ShotOnTarget ">0t</td>
<td class="KeyPassTotal ">0t</td>
<td class="PassSuccessInMatch ">88t</td>
<td class="DuelAerialWon ">0t</td>
<td class="Touches ">35t</td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>
"""
parsed_html = BeautifulSoup(html, 'html5lib')
print(html)
我打印了以下 html:
<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span> </td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td>
<td class="ShotsTotal ">0 </td>
<td class="ShotOnTarget ">0 </td>
<td class="KeyPassTotal ">0 </td>
<td class="PassSuccessInMatch ">88 </td>
<td class="DuelAerialWon ">0 </td>
<td class="Touches ">35 </td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>
看不到任何缺失。