我正试图制作python脚本,使用请求和bs4获得学生的所有成绩。现在我有一个问题,循环值
for rows in tr:
td = tbody.find_all('td')
subject.append(td[0].get_text())
fq.append(td[1].get_text())
sq.append(td[2].get_text())
ave.append(td[3].get_text())
for i in subject:
print(f"Subject: {i}")
for i in fq:
print(f"First Quarter: {i}")
for i in sq:
print(f"Second Quarter: {i}")
for i in ave:
print(f"Average: {i}")
# here my goal is there are 4 list and are all connected like all the first value of the subject list, f_quar, s_quar and the average are linked together, like gen math(subject), 90(f_qaur), 90(s_qaur), and 90(average)
输出:
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
Subject: GENERAL MATHEMATICS
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
First Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Second Quarter: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
Average: ##.00
预期输出:
Subject: Gen Math
Subject: Stats
...
First Quarter: 90.00
First Quarter: 90.00
...
Second Quarter: 90.00
Second Quarter: 90.00
...
Average: 90.00
Average: 90.00
...
我是pyton新手,所以循环是我的弱点。此外,代码似乎是错误的,因为我需要科目,1stQ等级,2ndQ等级和平均水平。谢谢!。这是表格的html代码:
<table cellspacing="0" class="table table-bordered table-striped" id="tblss1" width="100%">
<thead>
<tr class="success">
<th style="text-align:center">SUBJECT</th>
<th style="text-align:center">1ST</th>
<th style="text-align:center">2ND</th>
<th style="text-align:center">AVE</th>
</tr>
</thead>
<tbody>
<tr>
<td style="color:purple"> GENERAL MATHEMATICS </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> EARTH SCIENCE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> PHYSICAL EDUCATION AND HEALTH </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50 </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL CHEMISTRY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> 21ST CENTURY LITERATURE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> READING AND WRITING </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL BIOLOGY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00 </strong></td>
</tr>
<tr>
<td style="color:purple"> ENTREPRENEURSHIP </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50 </strong></td>
</tr>
</tbody>
</table>
根据我的理解,这是你想要达到的目标。我假设第一个for循环实际上正确地添加了所有数据。
subjects = []
fq = []
sq = []
avgs = []
for rows in tr:
td = tbody.find_all('td')
subjects.append(td[0].get_text())
fq.append(td[1].get_text())
sq.append(td[2].get_text())
avgs.append(td[3].get_text())
for subject in subjects:
print(subject)
for f in fq:
print(f)
for s in sq:
print(s)
for a in avgs:
print(a)
您使用i作为索引两次(外部和内部循环)。
我不确定解释器是否可以处理"覆盖"因为它可能会这样做,但在返回到外部循环后,I中的对象/迭代器-游标可能会消失。
尝试更改内部循环索引变量名称,以避免从外部循环重写i。
如果这不能解决您的问题,请更详细地描述您试图达到的目标或看到的行为是什么。
*邮报》编辑:这样,您只会对所有条目得到相同的结果。您需要按照以下步骤构建一个双循环:
- 查找所有tr块并遍历它们
的tr_block tbody.find_all (tr)
- 在每个tr_block中添加相应的td块到它们的列表
td = tr_block.find_all('td')
subject.append (td [0] .get_text ()) #[...]
- 之后,你应该有列表填满html中的所有数据,然后你可以在需要的时候压缩到一起。
在这种情况下,将表读入数据框架会更简单、更快:
import pandas as pd
table = """[your html above]"""
print(pd.read_html(table)
输出:
SUBJECT 1ST 2ND AVE
0 GENERAL MATHEMATICS ##.00 ##.00 ##.00
1 EARTH SCIENCE ##.00 ##.00 ##.00
2 PHYSICAL EDUCATION AND HEALTH ##.00 ##.00 ##.50
3 GENERAL CHEMISTRY 1 ##.00 ##.00 ##.00
等。