如何用美汤刮桌子?



我试着根据问题刮表:Python BeautifulSoup刮表

从顶部的解决方案,那里我尝试:

HTML代码:

<div class="table-frame small">
<table id="rfq-display-line-items-list" class="table">
<thead id="rfq-display-line-items-header">
<tr>
<th>Mfr. Part/Item #</th>
<th>Manufacturer</th>
<th>Product/Service Name</th>
<th>Qty.</th>
<th>Unit</th>
<th>Ship Address</th>
</tr>
</thead>
<tbody id="rfq-display-line-item-0">
<tr>
<td><span class="small">43933</span></td>
<td><span class="small">Anvil International</span></td>
<td><span class="small">Cap Steel Black 1-1/2"</span></td>
<td><span class="small">800</span></td>
<td><span class="small">EA</span></td>
<td><span class="small">1</span></td>
</tr>
<!----><!---->
</tbody><tbody id="rfq-display-line-item-1">
<tr>
<td><span class="small">330035205</span></td>
<td><span class="small">Anvil International</span></td>
<td><span class="small">1-1/2" x 8" Black Steel Nipple</span></td>
<td><span class="small">400</span></td>
<td><span class="small">EA</span></td>
<td><span class="small">1</span></td>
</tr>
<!----><!---->
</tbody><!---->
</table><!---->
</div>

根据解,

我尝试的是:

for tr in soup.find_all('table', {'id': 'rfq-display-line-items-list'}):
tds = tr.find_all('td')
print(tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text, tds[5].text)

但是这只显示了第一行

43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1

我后来发现所有这些<td>都存储在列表中。我想打印所有的行

预期输出:

43933      Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205  Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1         

tr标签&到td

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for tr in soup.find("table", id="rfq-display-line-items-list").find_all("tr"):
print(" ".join([td.text for td in tr.find_all('td')]))

43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1

发生了什么?

当您使用find_all()选择表时,您将获得仅包含一个元素(表)的结果集,这就是为什么您的循环仅迭代一个并仅打印第一行的原因。

如何修复?

选择你的目标更具体-作为替代方法,你也可以使用css selctorsstripped_strings来完成你的任务。

这将从元素(表)的<tbody>中选择所有<tr>,id="rfq-display-line-items-list":

soup.select('#rfq-display-line-items-list tbody tr')

stripped_strings作为生成器获得row中所有元素(<td>s)的字符串,您可以将其join()转换为字符串:

" ".join(list(row.stripped_strings))

例子
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for row in soup.select('#rfq-display-line-items-list tbody tr'):
print(" ".join(list(row.stripped_strings)))

输出
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1

最新更新