使用BeautifulSoup提取混乱的HTML表的最佳方法



我正试图从HTML文件中提取一个表。表格如下:

Form 990 FYE    Date Published  Overall Score  Stars
CN 2.1
2019-06               12/23/2020    96.98   
2017-06               05/01/2018    97.46   
2016-06               06/01/2017    100.00  
2015-06               07/01/2016    99.98   
2015-06               06/01/2016    97.87   

CN 2.0
2015-06                04/01/2016   95.22   
2014-06               10/01/2015    94.56   
2014-06               09/01/2015    86.22   
2013-06               02/01/2014    95.01   
2012-06               09/01/2013    95.24   
2012-06               07/01/2013    88.04   
2011-06               12/01/2012    99.13   
2011-06               04/01/2012    92.17   
2010-06               09/20/2011    92.17

表格HTML如下所示:

<table class="summaryPage ratings" width="100%">
<tr>
<th align="left" scope="col">Form 990 FYE</th>
<th align="left" scope="col">Date Published</th>
<th align="center" scope="col">Overall Score</th>
<th scope="col" style="text-align: center;">Overall Rating</th>
</tr>
<tr class="methodology-2-1 current">
<td colspan="10">
<b><a href="/index.cfm?bay=content.view&amp;cpid=2200">CN 2.1</a></b>
</td>
</tr>
<tr class="current">
<td>
2019-06
</td>
<td>
12/23/2020
</td>
<td align="center">96.98</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2017-06
</td>
<td>
05/01/2018
</td>
<td align="center">97.46</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2016-06
</td>
<td>
06/01/2017
</td>
<td align="center">100.00</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2015-06
</td>
<td>
07/01/2016
</td>
<td align="center">99.98</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
<span id="cf_tooltip_28842661508586">
2015-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
06/01/2016
</td>
<td align="center">97.87</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td colspan="10"></td>
</tr>
<tr class="">
<td colspan="10">
<b><a href="/index.cfm?bay=content.view&amp;cpid=2200">CN 2.0</a></b>
</td>
</tr>
<tr class="">
<td>
2015-06
</td>
<td>
04/01/2016
</td>
<td align="center">95.22</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2014-06
</td>
<td>
10/01/2015
</td>
<td align="center">94.56</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508587">
2014-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
09/01/2015
</td>
<td align="center">86.22</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>three stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2013-06
</td>
<td>
02/01/2014
</td>
<td align="center">95.01</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2012-06
</td>
<td>
09/01/2013
</td>
<td align="center">95.24</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508588">
2012-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
07/01/2013
</td>
<td align="center">88.04</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>three stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2011-06
</td>
<td>
12/01/2012
</td>
<td align="center">99.13</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508589">
2011-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
04/01/2012
</td>
<td align="center">92.17</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2010-06
</td>
<td>
09/20/2011
</td>
<td align="center">92.17</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
</table>

请注意,该表很简单,但HTML代码可能有点混乱。列Stars的数据在代码svg class="stars"的块中找到,其余数据在类似tr class="methodology-2-0"的块中发现。我想提取表格来存储它,由于我将对几千个文件进行提取,我想知道什么是最好的方法。我想要的输出如下:

Form 990 FYE    Date Published  Overall Score  Stars     CN
2019-06               12/23/2020    96.98          X stars  CN 2.1
2017-06               05/01/2018    97.46          Y star   CN 2.0
2016-06               06/01/2017    100.00         ....     ......

我想知道最好的方法是什么。我在这里发现的第一种方法在我调整它时不起作用:

sumtab= soup.find('table',class_='summaryPage ratings')
sumdf = pd.DataFrame(columns=['Form 990 FYE','Date Published','Overall Score','Overall Rating'])
for row in sumtab.find_all('tr'):
cols = row.find_all('td')
row_list = [ data.text for data in cols ]
temp_df = pd.DataFrame([row_list], columns = ['Form 990 FYE','Date Published','Overall Score','Overall Rating'])
sumdf = sumdf.append(temp_df).reset_index(drop = True)

sumdf = sumdf.iloc[1:, :] 

以下尝试也不起作用:

table = pd.read_html(soup.find(class_="summaryPage ratings"))
print(table)

你有什么建议吗?

当您在迭代列行时遇到CN时,您可以将其存储在一个值中,并不断将当前CN值添加到列行列表中:

from bs4 import BeautifulSoup
import pandas as pd
soup = BeautifulSoup(your_html)
lists = []
cn = None
for row in soup.find_all('tr'):
cols = row.find_all('td')
c = [i.text.strip() for i in cols]
if len(c) == 1:
cn = c[0]
elif len(c) > 1:
c = c + [cn]
lists.append(c)

df = pd.DataFrame(lists, columns = ['Form 990 FYE','Date Published','Overall Score','Stars', 'CN']) 

结果:

020年12月23日>7.462017年1月6日td style="text align:right;">3
表格990 FYE
02019-0696.9812017062018年1月5日22016-062015-064

相关内容

最新更新