使用BeautifulSoup查找测试用例和结果

我需要一种很好的方法来在html文件中找到所有测试用例的名称和每个测试用例的结果。我是BeautifulSoup的新手，需要一些好的建议。

首先，我使用BeautifulSoup读取数据并对其进行美化，然后将数据放入一个文件中：

from bs4 import BeautifulSoup
f = open('myfile','w')
soup = BeautifulSoup(open("C:DEVdebugkoddata.html"))
fixedSoup = soup.prettify()
fixedSoup = fixedSoup.encode('utf-8')
f.write(fixedSoup)
f.close()

当我在文件中检查漂亮结果中的部分时，例如，它看起来是这样的（文件包括100个tc和结果）：

<a name="1005">
  </a>
  <div class="Sequence">
   <div class="Header">
    <table class="Title">
     <tr>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication (Sequence)
      </td>
      <td class="ResultStateIcon">
       <img src="Resources/Passed.png"/>
      </td>
     </tr>
    </table>
    <table class="DynamicAttributes">
     <colgroup>
      <col width="20">
       <col width="30">
        <col width="20">
         <col width="30">
         </col>
        </col>
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Start time:
      </th>
      <td>
       2014/09/23 09-24-31
      </td>
      <th>
       Stop time:
      </th>
      <td>
       2014/09/23 09-27-25
      </td>
     </tr>
     <tr>
      <th>
       Execution duration:
      </th>
      <td>
       173.461 sec.
      </td>
      *<th>
       Name:
      </th>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>*
     </tr>
     <tr>
      <th>
       Library link:
      </th>
      <td>
      </td>
      <th>
       Creation date:
      </th>
      <td>
       2013/4/11, 8-55-57
      </td>
     </tr>
     <tr>
      <th>
       Modification date:
      </th>
      <td>
       2014/9/23, 9-27-25
      </td>
      <th>
       Author:
      </th>
      <td>
       cnnntd
      </td>
     </tr>
     <tr>
      <th>
       Hierarchy:
      </th>
      <td>
       IAA.  IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>
      <td>
      </td>
      <td>
      </td>
     </tr>
    </table>
    <table class="StaticAttributes">
     <colgroup>
      <col width="20">
       <col width="80">
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Description:
      </th>
      <td>
      </td>
     </tr>
     <tr>
      <th>
       *Result state:
      </th>
      <td>
       Passed
      </td>*
     </tr>
    </table>
   </div>
   <div class="BlockReport">
    <a name="1007">

在这个文件中，我现在想找到关于"名称"one_answers"结果状态："的信息。如果检查漂亮的结果，我可以看到标签"名称："one_answers"结果状态："。希望可以使用它们来查找testCase名称和测试结果。。。所以打印输出应该是这样的：

 Name = IAA REQPROD 55 InvPwrDownMode - Shut down communication 
 Result = Passed
 etc

有人知道如何使用BeautifulSoup吗？

使用第二个Pastebin链接中的html，获得以下代码：

from bs4 import BeautifulSoup
soup = BeautifulSoup(open("beautifulsoup2.html"))

names = []
for table in soup.findAll('table', attrs={'class': 'Title'}):
    td = table.find('td')
    names.append(td.text.encode("ascii", "ignore").strip())
results = []
for table in soup.findAll(attrs={'class': 'StaticAttributes'}):
    tds = table.findAll('td')
    results.append(tds[1].text.strip())
for name, result in zip(names, results):
    print "Name = {}".format(name)
    print "Result = {}".format(result)
    print

给出以下结果：

Name = IEM(Project)
Result = PassedFailedUndefinedError
Name = IEM REQPROD 132765 InvPwrDownMode - Shut down communication SN1(Sequence)
Result = Passed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep SN2(Sequence)
Result = PassedUndefined
Name = IEM Test(Sequence)
Result = Failed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep(Sequence)
Result = Error

我添加了encode("ascii", "ignore")，因为否则我会得到UnicodeDecodeError的。看看这个答案，了解这些字符是如何在你的html中出现的。

相关内容

最新更新

热门标签：