自动化错误:800706B5、80004005、80010108出现内部 SAP 站点抓取

我正在编写一个宏，它将抓取我公司的内部SAP站点以获取供应商信息。出于多种原因，我必须使用 VBA 来执行此操作。但是，我无法弄清楚为什么在尝试抓取页面时不断收到这三个错误。这是否可能与 UAC 完整性模型有关？还是我的代码有问题？使用http的网页是否可以在Internet Explorer中以不同的方式处理？我可以访问任何网页，甚至是其他内部网页，并且可以很好地抓取每个网页。但是当我尝试抓取SAP页面时，我收到这些错误。错误描述及其发生时间如下：

800706B5 - 接口未知（在运行有问题的代码之前放置断点时发生）

80004005 - 未指定的错误（当我不放置任何错误而只是让宏运行时发生）

80010108 - 调用的对象已与其客户端断开连接。（我似乎无法始终如一地出现此错误，它似乎发生在 excel 中的某些内容损坏以至于无法加载页面并且我必须重新安装 excel 的时候）

我完全不知道发生了什么。完整性页面对我来说没有多大意义，我发现的所有研究都谈到了连接到数据库和使用 ADO 和 COM 引用。但是，我正在通过Internet Explorer做所有事情。这是我下面的相关代码：

Private Sub runTest_Click()
   ie.visible = True
   doScrape
End Sub
'The code to run the module
Private Sub doTest()
   Dim result As String
   result = PageScraper.scrapeSAPPage("<some num>")
End Sub

页面抓取器模块

Public Function scrapeSAPPage(num As Long) As String
   'Predefined URL that appends num onto end to navigate to specific record in SAP
   Dim url As String: url = "<url here>" 
   Dim ie as InternetExplorer
   set ie = CreateObject("internetexplorer.application")
   Dim doc as HTMLDocument
   ie.navigate url 'Will always sucessfully open page, regardless of SAP or other
   'pauses the exection of the code until the webpage has loaded
   Do
     'Will always fail on next line when attempting SAP site with error
     If Not ie.Busy And ie.ReadyState = 4 Then 
        Application.Wait (Now + TimeValue("00:00:01"))
        If Not ie.Busy And ie.ReadyState = 4 Then
           Exit Do
        End If
     End If
     DoEvents
   Loop
   Set doc = ie.document 'After implementation of Tim Williams changes, breaks here
   'Scraping code here, not relevant
 End Function

我在Windows 7机器上使用IE9和Excel 2010。您能提供的任何帮助或见解将不胜感激。谢谢。

我经常进行这种类型的抓取，并且发现很难使IE自动化100%可靠地工作，并出现您发现的错误。由于它们通常是计时问题，因此调试可能会非常令人沮丧，因为它们在您单步执行时不会出现，仅在实时运行期间出现为了尽量减少错误，我执行以下操作：

引入更多的延迟;即忙碌和即。ReadyState不一定在ie.navigate之后立即给出有效的答案，所以在ie.navigate之后引入短暂的延迟。对于我正常加载 1 到 2 秒的东西，但超过 500 毫秒的任何内容似乎都可以工作。

在转到目标 URL 之前，通过 ie.navigate "about：blank" 确保 IE 处于干净状态。

之后，您应该有一个有效的IE对象，您必须查看它以查看其中包含的内容。通常，我避免尝试访问整个ie.document，而是使用IE.document.all.tags（"x"），其中"x"是我正在寻找的合适东西，例如td或a。

然而，在所有这些改进之后，尽管它们提高了我的成功率，但我仍然随机犯了错误。

我真正的解决方案是放弃IE，而是使用xmlhttp完成我的工作。

如果您在文档上使用文本操作解析数据，那么交换将是不费吹灰之力的。 xmlhttp 对象要可靠得多。您只需获得"响应文本"即可访问文档的整个 HTML。

这是我现在在生产中使用的抓取的简化版本，它非常可靠，可以在一夜之间生成数百万行而不会出错。

Public Sub Main()
Dim obj As MSXML2.ServerXMLHTTP
Dim strData As String
Dim errCount As Integer
' create an xmlhttp object - you will need to reference to the MS XML HTTP library, any version will do
' but I'm using Microsoft XML, v6.0 (c:windowssystem32msxml6.dll)
Set obj = New MSXML2.ServerXMLHTTP
' Get the url - I set the last param to Async=true so that it returns right away then lets me wait in
' code rather than trust it, but on an internal network "false" might be better for you.
obj.Open "GET", "http://www.google.com", True
obj.send ' this line actually does the HTTP GET
' Wait for a completion up to 10 seconds
errCount = 0
While obj.readyState < 4 And errCount < 10
    DoEvents
    obj.waitForResponse 1 ' this is an up-to-one-second delay
    errCount = errCount + 1
Wend
If obj.readyState = 4 Then ' I do these on two
    If obj.Status = 200 Then ' different lines to avoid certain error cases
        strData = obj.responseText
    End If
End If
obj.abort  ' in real code I use some on error resume next, so at this point it is possible I have a failed
           ' get and so best to abort it before I try again
Debug.Print strData
End Sub

希望有帮助。

相关内容

最新更新

热门标签：