我需要一种很好的方法来在html文件中找到所有测试用例的名称和每个测试用例的结果。我是BeautifulSoup的新手,需要一些好的建议。
首先,我使用BeautifulSoup读取数据并对其进行美化,然后将数据放入一个文件中:
from bs4 import BeautifulSoup
f = open('myfile','w')
soup = BeautifulSoup(open("C:DEVdebugkoddata.html"))
fixedSoup = soup.prettify()
fixedSoup = fixedSoup.encode('utf-8')
f.write(fixedSoup)
f.close()
当我在文件中检查漂亮结果中的部分时,例如,它看起来是这样的(文件包括100个tc和结果):
<a name="1005">
</a>
<div class="Sequence">
<div class="Header">
<table class="Title">
<tr>
<td>
IAA REQPROD 55 InvPwrDownMode - Shut down communication (Sequence)
</td>
<td class="ResultStateIcon">
<img src="Resources/Passed.png"/>
</td>
</tr>
</table>
<table class="DynamicAttributes">
<colgroup>
<col width="20">
<col width="30">
<col width="20">
<col width="30">
</col>
</col>
</col>
</col>
</colgroup>
<tr>
<th>
Start time:
</th>
<td>
2014/09/23 09-24-31
</td>
<th>
Stop time:
</th>
<td>
2014/09/23 09-27-25
</td>
</tr>
<tr>
<th>
Execution duration:
</th>
<td>
173.461 sec.
</td>
*<th>
Name:
</th>
<td>
IAA REQPROD 55 InvPwrDownMode - Shut down communication
</td>*
</tr>
<tr>
<th>
Library link:
</th>
<td>
</td>
<th>
Creation date:
</th>
<td>
2013/4/11, 8-55-57
</td>
</tr>
<tr>
<th>
Modification date:
</th>
<td>
2014/9/23, 9-27-25
</td>
<th>
Author:
</th>
<td>
cnnntd
</td>
</tr>
<tr>
<th>
Hierarchy:
</th>
<td>
IAA. IAA REQPROD 55 InvPwrDownMode - Shut down communication
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
<table class="StaticAttributes">
<colgroup>
<col width="20">
<col width="80">
</col>
</col>
</colgroup>
<tr>
<th>
Description:
</th>
<td>
</td>
</tr>
<tr>
<th>
*Result state:
</th>
<td>
Passed
</td>*
</tr>
</table>
</div>
<div class="BlockReport">
<a name="1007">
在这个文件中,我现在想找到关于"名称"one_answers"结果状态:"的信息。如果检查漂亮的结果,我可以看到标签"名称:"one_answers"结果状态:"。希望可以使用它们来查找testCase名称和测试结果。。。所以打印输出应该是这样的:
Name = IAA REQPROD 55 InvPwrDownMode - Shut down communication
Result = Passed
etc
有人知道如何使用BeautifulSoup吗?
使用第二个Pastebin链接中的html,获得以下代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("beautifulsoup2.html"))
names = []
for table in soup.findAll('table', attrs={'class': 'Title'}):
td = table.find('td')
names.append(td.text.encode("ascii", "ignore").strip())
results = []
for table in soup.findAll(attrs={'class': 'StaticAttributes'}):
tds = table.findAll('td')
results.append(tds[1].text.strip())
for name, result in zip(names, results):
print "Name = {}".format(name)
print "Result = {}".format(result)
print
给出以下结果:
Name = IEM(Project)
Result = PassedFailedUndefinedError
Name = IEM REQPROD 132765 InvPwrDownMode - Shut down communication SN1(Sequence)
Result = Passed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep SN2(Sequence)
Result = PassedUndefined
Name = IEM Test(Sequence)
Result = Failed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep(Sequence)
Result = Error
我添加了encode("ascii", "ignore")
,因为否则我会得到UnicodeDecodeError
的。看看这个答案,了解这些字符是如何在你的html中出现的。