VBA href 爬取浏览器的源代码



我确实更新了我的问题,因为我更清楚地知道我试图解决的技术问题。

A。如果我们从数据机构的网站上搜索得到的URL,我们就会得到这个

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000010795&type=10-K&dateb=&owner=exclude&count=20

B。通过在浏览器中输入步骤A的URL并转到我们在第100行看到的源代码(我使用谷歌浏览器),这条迷人的行也是一个可点击的链接:

href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm"

这一行包含在我们源代码的代码片段中:

<tr>
<td nowrap="nowrap">10-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm" id="documentsbutton">&nbsp;Documents</a>&nbsp; <a href="/cgi-bin/viewer?action=view&amp;cik=10795&amp;accession_number=0001193125-13-456802&amp;xbrl_type=v" id="interactiveDataBtn">&nbsp;Interactive Data</a></td>
<td class="small" >Annual report [Section 13 and 15(d), not S-K Item 405]<br />Acc-no: 0001193125-13-456802&nbsp;(34 Act)&nbsp; Size: 15 MB            </td>
<td>2013-11-27</td>
<td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-04802&amp;owner=exclude&amp;count=20">001-04802</a><br>131247478         </td>
</tr>

C。如果我们在第100行点击步骤A的链接,我们将转到下一页,步骤A的连接现在将成为URL的一部分所以我们得到的是一个分配给这个URL的新页面:

https://www.sec.gov/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm

D。使用相同的方法,我们在第182行遇到这行代码

href="/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

如果我们点击这一行,我们就会得到下面宏中的strXMLSite。一旦您查看并运行了宏,您就会明白,如果我们可以将相关过程集成到宏中,字符串可以在运行时填充所需的URL,是一个合乎逻辑的结论。这是问题的核心。


我们已经激活了该过程所需的宏Microsoft XML核心服务(MSXML)(Excel-->VBE-->Tools-->References-->Microsoft XML,v6.0)。

如何通过向过程中添加语句,使VBA爬网从步骤A中的URL通过源代码到现在strXMLSite字符串中的URL?我们是否需要从Tools-->References激活库?你能给我看一个使用这种方法的代码块吗?在这一点上的方针是什么?

出于完整性的原因,请允许我提供@user2140261 提供的宏

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode
Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument
strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"
objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)
Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")
Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub

感谢您观看我的问题

添加对"Microsoft Internet控件"的引用。这将使您达到可以获得单个xml链接的地步。

Sub Tester()
Dim IE As New InternetExplorer
Dim els, el, colDocLinks As New Collection
Dim lnk
IE.Visible = True
Loadpage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
"action=getcompany&CIK=0000010795&type=10-K" & _
"&dateb=&owner=exclude&count=20"
'collect all the "Document" links on the page
Set els = IE.Document.getelementsbytagname("a")
For Each el In els
If Trim(el.innerText) = "Documents" Then
'Debug.Print el.innerText, el.href
colDocLinks.Add el.href
End If
Next el
'loop through the "document" links and check each page for xml links
For Each lnk In colDocLinks
Loadpage IE, CStr(lnk)
For Each el In IE.Document.getelementsbytagname("a")
If el.href Like "*.xml" Then
Debug.Print el.innerText, el.href
'work with the document from this link
End If
Next el
Next lnk
End Sub
Sub Loadpage(IE As Object, URL As String)
IE.navigate URL
Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
End Sub

最新更新