Wednesday 17 January 2018

VBA - WebScraping - firing an HTML button's event from VBA with Click, FireEvent and Window.ExecScript

So, many questions on the StackOverflow VBA thread are concerned with web-scraping and so driving the browser in VBA code is a useful skill. Here we're see that we can push the boundary a little more by firing events in the HTML object model.

Code To Write HTML

Here we write some code to write the html file locally. The code uses the HtmlElementStack class (given separately below) to ensure our HTML is well-formed. There is a script tag which defines the function wired to the button's click handler. But we shall see that because the function is defined in the HTML page's global scope, i.e. Window object, then it can be called with Window.execScript


Private Const msFILENAME As String = "N:\TestJavascript2.html"

Function WriteHTML()

    Dim dicHTMLStack As HtmlElementStack
    Set dicHTMLStack = New HtmlElementStack

    With dicHTMLStack
        .SetFileName msFILENAME 
        
        .OE "html"
        .OE "head"
        .OE "title"
        .WL "Some test html with javascript"
        .CE
        .CE
        .OE "body"
        .OE "div", "id='div1'"
        .WL "Some Text"
        .CE
        .OE "button", "id='button1' type='button' onclick='throwMsgBox()'"
        .WL "Click Me!"
        .CE
        .OE "script", "language='jscript'"
        .WL "function throwMsgBox() { alert('hi there'); }"
        .CE
    End With
    
    Set dicHTMLStack = Nothing

End Function


HtmlElementStack class

This class allows us to be a little lazy when writing html files. We keep a stack, ie. Last In Last Out (LIFO) structure in a Scripting.Dictionary, that records all the open elements that need closing. It also manages its own text stream because we need to write off all pending close elements before the text stream is closed.


Option Explicit

'* Tools->References
' Scripting            Microsoft Scripting Runtime     C:\Windows\SysWOW64\scrrun.dll


Private mdicStack As New Scripting.Dictionary

Private mtxt As Scripting.TextStream
Private msFILENAME As String
Private mfso As New Scripting.FileSystemObject

'Private Sub SetStream(ByVal txt As Scripting.TextStream)
'    Set mtxt = txt
'End Sub

Public Sub SetFileName(ByVal sFileName As String)
    msFILENAME = sFileName
    Set mtxt = mfso.CreateTextFile(msFILENAME)
End Sub

Public Sub Write_(ByVal sText As String)
    mtxt.Write sText
End Sub

Public Sub WL(ByVal sText As String)
    mtxt.WriteLine sText
End Sub

Public Sub OE(ByVal sNodeName As String, Optional ByVal sAttribs As String)
    
    If Not mtxt Is Nothing Then
        If Len(sAttribs) = 0 Then
            mtxt.WriteLine "<" & sNodeName & ">"
        Else
            mtxt.WriteLine "<" & sNodeName & " " & sAttribs & ">"
        End If
    
        
    End If

    mdicStack.Add mdicStack.Count, sNodeName
    

End Sub

Public Sub CE()

    If mdicStack.Count > 0 Then
        Dim sLastNode As String
        sLastNode = mdicStack.Item(mdicStack.Count - 1)
        
        Call mdicStack.Remove(mdicStack.Count - 1)
    
        mtxt.WriteLine ""
    End If

End Sub

Private Sub Class_Terminate()

    While mdicStack.Count > 0
        DoEvents
        CE
        DoEvents
    Wend
    
    mtxt.Close

    Set mtxt = Nothing
End Sub


Code to drive IE and call the click handler function 3 different ways

So in this code we create an instance of IE and navigate to our newly written html file. We call the button's click handler function 3 different ways. Firstly, by navigating to element and call 'Click'. Secondly, by calling the function in the global scope (i.e. off the window object) using ExecScript. Thirdly, similar to first but a looser couple FireEvent method.

I recommend acquiring the element immediately before calling a method because I have witnessed a type of stale reference bug.


Private Const msFILENAME As String = "N:\TestJavascript2.html"

Public Sub TestFire()
    
'* Tools->References
'SHDocVw    Microsoft Internet Controls C:\Windows\SysWOW64\ieframe.dll
    
    Dim oIE As InternetExplorerMedium
    Set oIE = New InternetExplorerMedium
    
    oIE.Visible = True
    oIE.navigate msFILENAME
    While oIE.Busy Or oIE.readyState < 4
        DoEvents
    Wend
    
    
    Stop
    '* recommend re-acquiring element before using as I suspect IE suffers from stale references
    oIE.Document.getElementById("button1").Click
    
    Stop

    '* call the function via the global scope, for html global scope is the *window*
    Call oIE.Document.parentWindow.execScript("throwMsgBox()", "JavaScript")

    Stop
    
    '* recommend re-acquiring element before using as I suspect IE suffers from stale references
    oIE.Document.getElementById("button1").FireEvent "onclick"
    
    Stop
    
    
    oIE.Quit
End Sub


Links

No comments:

Post a Comment