使用循环迭代HTML表值python



我正试图制作python脚本,使用请求和bs4获得学生的所有成绩。现在我有一个问题,循环值

for rows in tr:
td = tbody.find_all('td')
subject.append(td[0].get_text())
fq.append(td[1].get_text())
sq.append(td[2].get_text())
ave.append(td[3].get_text())

for i in subject:
print(f"Subject: {i}")
for i in fq:
print(f"First Quarter: {i}")

for i in sq:
print(f"Second Quarter: {i}")
for i in ave:
print(f"Average: {i}")
# here my goal is there are 4 list and are all connected like all the first value of the subject list, f_quar, s_quar and the average are linked together, like gen math(subject), 90(f_qaur), 90(s_qaur), and 90(average)

输出:

Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS 
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00 
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00

预期输出:

Subject: Gen Math
Subject: Stats
...
First Quarter: 90.00
First Quarter: 90.00
...
Second Quarter: 90.00
Second Quarter: 90.00
...
Average: 90.00
Average: 90.00
...

我是pyton新手,所以循环是我的弱点。此外,代码似乎是错误的,因为我需要科目,1stQ等级,2ndQ等级和平均水平。谢谢!。这是表格的html代码:

<table cellspacing="0" class="table table-bordered table-striped" id="tblss1" width="100%">
<thead>
<tr class="success">
<th style="text-align:center">SUBJECT</th>
<th style="text-align:center">1ST</th>
<th style="text-align:center">2ND</th>
<th style="text-align:center">AVE</th>
</tr>
</thead>
<tbody>
<tr>
<td style="color:purple"> GENERAL MATHEMATICS </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> EARTH SCIENCE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> PHYSICAL EDUCATION AND HEALTH </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50  </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL CHEMISTRY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> 21ST CENTURY LITERATURE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> READING AND WRITING </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL BIOLOGY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> ENTREPRENEURSHIP </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50  </strong></td>
</tr>
</tbody>
</table>

根据我的理解,这是你想要达到的目标。我假设第一个for循环实际上正确地添加了所有数据。

subjects = [] 
fq = []
sq = []
avgs = []
for rows in tr:
td = tbody.find_all('td')
subjects.append(td[0].get_text())
fq.append(td[1].get_text())
sq.append(td[2].get_text())
avgs.append(td[3].get_text())
for subject in subjects:
print(subject)
for f in fq:
print(f)
for s in sq:
print(s)
for a in avgs:
print(a)

您使用i作为索引两次(外部和内部循环)。

我不确定解释器是否可以处理"覆盖"因为它可能会这样做,但在返回到外部循环后,I中的对象/迭代器-游标可能会消失。

尝试更改内部循环索引变量名称,以避免从外部循环重写i。

如果这不能解决您的问题,请更详细地描述您试图达到的目标或看到的行为是什么。

*邮报》编辑:这样,您只会对所有条目得到相同的结果。您需要按照以下步骤构建一个双循环:

  1. 查找所有tr块并遍历它们

的tr_block tbody.find_all (tr)

  1. 在每个tr_block中添加相应的td块到它们的列表

td = tr_block.find_all('td')

subject.append (td [0] .get_text ()) #[...]

  1. 之后,你应该有列表填满html中的所有数据,然后你可以在需要的时候压缩到一起。

在这种情况下,将表读入数据框架会更简单、更快:

import pandas as pd
table = """[your html above]"""
print(pd.read_html(table)

输出:

SUBJECT    1ST    2ND    AVE
0            GENERAL MATHEMATICS  ##.00  ##.00  ##.00
1                  EARTH SCIENCE  ##.00  ##.00  ##.00
2  PHYSICAL EDUCATION AND HEALTH  ##.00  ##.00  ##.50
3            GENERAL CHEMISTRY 1  ##.00  ##.00  ##.00

等。

最新更新