VBA中的Web抓取DIV类



我在编写VBA代码时遇到了一些问题,无法从网站上抓取div类数据并将其放入excel中。由于隐私问题(患者数据(,我无法发布URL,但代码发布在下面:

<div id="location-1" class="Location">
<div class="grid">
<div class="row">
<div class="info">
<div class="column">
<div class="element-1">[text]</div>
<div class="element-2">[text]</div>
<p class="element-3"></p>
<p class="element-4">[text]</p>
<p class="element-5"></p>
<p class="element-6">[text]</p>
<div class="dir">
<a href="[link]" class="dir" target="_blank">Get dir</a>
</div> 
</div>
</div>
</div>
</div>
</div>

我的代码发布在下面。我正试图从";元素-1";以及";元素-2";为每个源分成1行。如有任何帮助,我们将不胜感激!

Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim source As Object

With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With

For Each source In html.getElementsByClassName("column")
x = x + 1: Cells(x, 1) = source.getAttribute("element-1")
Cells(x, 2) = source.getAttribute("element-2")
Next source

End Sub

这里有两种不同的解决方案。(未测试(

第一个:

Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim nodeColumnElements As Object
Dim currentRow As Long

currentRow = x 'Here your start row

With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With

Set nodeColumnElements = html.getElementsByClassName("column")(0).getElementsByTagName("div")
Cells(currentRow, 1) = Trim(nodeColumnElements(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(nodeColumnElements(1).innertext)
End Sub

第二个直接抓取两个元素:

Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim currentRow As Long

currentRow = x 'Here your start row

With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With

Cells(currentRow, 1) = Trim(html.getElementsByClassName("element-1")(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(html.getElementsByClassName("element-2")(0).innertext)
End Sub

最新更新