我的代码下面的代码将提取一天中每个小时的值。
但是,我要抓取的网页可能会更改,因此我想找到一种分配到变量的位置的方法,以便它每次都知道它是什么数字。我通过反复试验发现当前的" 116"。
我也包括下面的HTML结构。有什么建议么?
Sub scrape()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = False
.navigate "web address"
Do Until .readyState = 4
DoEvents
Loop
.document.all.item("Login1_UserName").Value = "user"
.document.all.item("Login1_Password").Value = "pw"
.document.all.item("Login1_LoginButton").Click
Do Until .readyState = 4
DoEvents
Loop
End With
Dim htmldoc As Object
Dim r
Dim c
Dim aTable As Object
Dim TDelement As Object
Set htmldoc = IE.document
Dim td As Object
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If span.Children(0).id = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
End Sub
html:
<form name="aspnetForm" id="aspnetForm" action="./MinMaxReport.aspx"
method="post">
<div>
</div>
<script type="text/javascript">...</script>
<div>
</div>
<table class="header-table">...</table>
<table class="page-area">
<tbody>
<tr>
<table id="ctl00_PageContent_Table1" border="0">...</table>
<table id="ctl00_PageContent_Table2" border="0">
<tbody>
<tr>
<td>
<div id="ctl00_PageContent_grdReport_div">
<tbody>
<tr style="background-color: beige;">
<td>...</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label50">Most Restrictive
Capacity Maximum</span>
</td>
<td>
<span id="ctl00_PageContent_grdReport_ctl08_Label51">159</span>
</td>
</tr>
</tbody>
</div>
</td>
</tr>
</tbody>
</table>
</table>
</tr>
</tbody>
</table>
您可以循环浏览所有TD,并检查ID =" CTL00_PAGECONTENT_GRDREPORT_CTL08_LABEL50"例如:
For Each td In htmldoc.getElementsByTagName("td")
On Error Resume Next
If td.Children(0).ID = "ctl00_PageContent_grdReport_ctl08_Label50" Then
ThisWorkbook.Sheets("sheet1").Range("j8").Offset(r, c).Value = td.Children(1).innerText
End If
On Error GoTo 0
Next td
儿童(0)将选择您的表单元格中包含的第一个IHTML元素。下一个错误简历是针对TD元素没有孩子的情况。您的网页中有可能拥有此ID的一个元素更多。然后,您必须先识别表或表行。我做不到,因为我看不到您的整个HTML代码。