被卡住了,从站点上刮擦某些字段



我已经在vba中写了一个脚本,我可以在特定网站上解析"公司名称","电话","传真"one_answers"电子邮件"地址"," Web"one_answers"名称"我被卡住了。我已经在VBA中使用ResponseText和Split方法编写了脚本。希望有人向我展示解决方法。

这是我尝试的:

str = Split(http.responseText, " class=""contact-details block dark"">")
y = UBound(str)
    For i = 1 To y
        Cells(x, 1) = Split(Split(str(i), "Company Name:")(1), "<")(0)
        Cells(x, 2) = Split(Split(str(i), "Phone:")(1), "<")(0)
        Cells(x, 3) = Split(Split(str(i), "Fax:")(1), "<")(0)
        Cells(x, 4) = Split(Split(str(i), "mailto:")(1), ">")(0)
        x = x + 1
    Next i

这里是html元素的东西:

<div class="contact-details block dark">
                <h3>Contact Details</h3><p>Company Name: PPEHeads Australia<br>Phone: +61 2 9824 5520<br>Fax: +61 2 9824 5526<br>Web: <a target="_blank" href="http://www.ppeheads.com.au">http://www.ppeheads.com.au</a></p><h4>Address</h4><p>Unit 2 / 4 Reaghs Farm Road<br>MINTO<br>NSW<br>2566</p><h4>Contact</h4><p>Name: Alan Hadfield<br>Phone: +61 2 9824 5520<br>Fax: +61 2 9824 5526<br>Email: <a href="mailto:alan@ppeheads.com.au">alan@ppeheads.com.au</a></p>
            </div>

请下次提供您的其余代码,因为问题可能不是您认为的问题。幸运的是,我在这里找到了您以前的帖子

如果您仔细观察您的HTML元素中有3 p标签:

第一,用于联系公司详细信息,您可以通过

获得
Set ele = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(0)

第二个是用于地址详细信息,您可以通过

获得
Set ele2 = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(1)

第三个用于联系人详细信息,您可以通过

获得
Set ele3 = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(2)

注意(0(,(1(,(2(在代码末尾更改,该更改为您提供了P TAG的外观顺序。

我修改了您以前的代码并评论了更改,因此您可以看到区别:

Sub RestData()
Dim http As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim ele, ele2, ele3 As Object, post As Object 'declare
Dim TypeDetails() As String
Dim TypeDetails3() As String 'declare
Dim TypeDetail() As String
Dim i As Long, r As Long
With CreateObject("MSXML2.serverXMLHTTP")
    .Open "GET", "http://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736", False
    .send
    html.body.innerHTML = .responseText
End With
'get all the p elements
Set ele = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(0)
Set ele2 = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(1)
Set ele3 = html.getElementsByClassName("contact-details block dark")(0).getElementsByTagName("p")(2)
r = 2
'split from line feed
TypeDetails() = Split(ele.innerText, Chr(10))
TypeDetails3() = Split(ele3.innerText, Chr(10))
'This part goes for Contact Company Details, notice the operator is ": ",not ":"
For i = 0 To UBound(TypeDetails())
    TypeDetail() = Split(TypeDetails(i), ": ")
    Cells(r, 1) = VBA.Trim(TypeDetail(0))
    Cells(r, 2) = VBA.Trim(TypeDetail(1))
    r = r + 1
Next i
'This part goes for Address Details, replaced new line with " " for it to be in the same line
Cells(r, 1) = "Address"
Cells(r, 2) = Replace(ele2.innerText, vbLf, " ")
r = r + 1
'This part goes for Contact Person Details
For i = 0 To UBound(TypeDetails3())
    TypeDetail() = Split(TypeDetails3(i), ": ")
    Cells(r, 1) = VBA.Trim(TypeDetail(0))
    Cells(r, 2) = VBA.Trim(TypeDetail(1))
    r = r + 1
Next i
Set html = Nothing: Set ele = Nothing: Set docs = Nothing
End Sub

我希望这会有所帮助

相关内容

最新更新