Friday 12 October 2018

VBA - Microsoft.ACE.OLEDB.12.0 - Skip HTML Import Export

This post follows on from the other day's mega-post on Microsoft.ACE.OLEDB.12.0. As part of the investigation into the Excel engine, I discovered how to use the Html Export and Html Import drivers (yes they are separate).

Initially I was excited to find another way to store and retrieve data without requiring a database. After writing some code my enthusiasm has waned and my technical recommendation is that you skip this technology.

Weaknesses with the HTML Import Export Drivers

So, I ought to itemise my concerns here...

Type Inference

So the Excel driver has to sample some rows to guess the type of column (I have yet to find a way to declare the column type) and the Html Import driver does so equally. I came up against this more so with the Html Import driver, I'm guessing it samples fewer rows, anyway I had to change the text in the sample data (compared to the mega post) and give the hex values leading ampersands to enforce its inference as being a string. E.g. I needed to replace 00FF00 to &00FF00 in SetUpSomeData(). I have discovered that ordinary text files can have schema.ini files attached, which would put text files streets ahead of Html files because it obviates the need for type inference.

Html Export is not well-formed Xml

I'd like to nominate Html for a troublesome technology award. Html is in same family as Xml but typically cannot be parsed with Xml parsers because it is not well-formed. The latest version of Html, Html 5, is well-formed so going forward things ought to be better. This driver Microsoft.ACE.OLEDB.12.0 is quite new and ought to be output well-formed Html but it doesn't. This means having to write some large amounts of paring code (see AggregateHtmlFiles() in sample code). I could write that code ten times more concisely with Xml library!

Missed opportunity to export multiple tables to one single file

So I myself had to write code to aggregate tables to one single file. It turned out to be more painful that I imagined. The code below shows HTML parsing logic (see AggregateHtmlFiles() in sample code). If I were to rewrite this I'd scrap that approach and start with an Xml representation of cell block and XSLT transform that into the correct Html. This is a shame, for a while I thought multi-table Html file could make a nice config file.

Sample Code

Anyway, I wrote plenty of experimental code. Instead of throwing away this code, I will deposit here. There is an example of exporting from a sheet to an Html file. There is an example of reading an Html file.

Also, because the export only allows one table per Html file I have written some code to aggregate separate single table files into a multi-table file, see AggregateHtmlFiles(). I did this with the same libraries that Internet Explorer uses and so had to workaround IE bugs, code would have been much simple if the InsertAdjacentHTML method wasn't buggy.

Option Explicit
Option Private Module

'* Tools -> References
'* ADODB  Microsoft ActiveX Data Objects 6.1 Library  C:\Program Files (x86)\Common Files\System\ado\msado15.dll
'* ADOX   Microsoft ADO Ext. 6.0 for DDL and Security C:\Program Files (x86)\Common Files\System\ado\msadox.dll

Private Const msColorsHtmlFile As String = "N:Colors.html" '<---- change this to your working file location
Private Const msCurrenciesHtmlFile As String = "N:Currencies.html" '<---- change this to your working file location
Private Const msTeamsHtmlFile As String = "N:Teams.html" '<---- change this to your working file location
Private Const msAggregatedHtmlFile As String = "N:Aggregated.html" '<---- change this to your working file location

Private Sub SetUpSomeData()
    '* WARNING this will wipe data!
    Dim sht As Excel.Worksheet
    Set sht = ThisWorkbook.Worksheets.Item("Sheet1")
    sht.Cells.Clear

    '*
    '* use our array literal trick, for more tricks tips and 'blue sky thinking'
    '* see http://exceldevelopmentplatform.blogspot.com
    '*
    Dim vColors As Variant
    vColors = [{"Color","RGB";"Red","&FF0000";"Green","&00FF00"}] '* note the addition of the ampersand for type inference

    sht.Range("A1:B3").Value2 = vColors

    Dim vCurrencies As Variant
    vCurrencies = [{"Country","Ccy";"France","Euro";"Japan","Yen"}]

    sht.Range("D1:E3").Value2 = vCurrencies

    Dim vTeams As Variant
    vTeams = [{"Team","Country";"New York Red Bulls","US";"Spartak Moskva","Russia";"Man Utd","England";"Barcelona","Spain";"Bayern Munich","Germany"}]

    sht.Range("G1:H6").Value2 = vTeams

End Sub

Private Sub TestWriteToHtmlFile()
    WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("A1").CurrentRegion, msColorsHtmlFile
    WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("D1").CurrentRegion, msCurrenciesHtmlFile
    WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("G1").CurrentRegion, msTeamsHtmlFile
End Sub

Private Sub WriteToHtmlFile(ByVal rngTable As Excel.Range, ByVal sColorsHtmlFile As String)

    Dim sTableAddress As String
    sTableAddress = "[" & rngTable.Worksheet.Name & "$" & rngTable.Address(False, False, xlA1) & "] """

    Dim oConnExcel As ADODB.Connection
    Set oConnExcel = New ADODB.Connection

    If UBound(Split(ThisWorkbook.Name, ".")) = 0 Then Err.Raise vbObjectError, , "#Workbook needs a file extension, i.e. saved at least once!"

    '*
    '* we're reading from worksheet so we need the Excel engine
    '*
    oConnExcel.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
           "Data Source=" & ThisWorkbook.FullName & ";" & _
           "Extended Properties='Excel 12.0 Macro;HDR=YES'"

    Dim sFileNameOnly As String, sFolderOnly As String

    If ParseFileName(sColorsHtmlFile, sFileNameOnly, sFolderOnly) Then

        If Not Dir(sColorsHtmlFile) = "" Then Kill sColorsHtmlFile   '* if it exists then delete before re-exporting

        Dim sCmdText As String
        sCmdText = VBA.Replace("SELECT * INTO [%filename%] in %quotedFolder% ""HTML Export;"" FROM %tableName%", "%filename%", sFileNameOnly)
        sCmdText = VBA.Replace(sCmdText, "%quotedFolder%", """" & sFolderOnly & """")
        sCmdText = VBA.Replace(sCmdText, "%tableName%", sTableAddress)

        '*
        '* HTML Export is specified in the command text, no need for a separate connection
        '*
        Debug.Print sCmdText
        oConnExcel.Execute sCmdText
    End If

End Sub


Private Sub TestReadFromHtmlFile()
    ReadFromHtmlFile msColorsHtmlFile, "Colors"
    ReadFromHtmlFile msCurrenciesHtmlFile, "Currencies"
End Sub


Private Sub ReadFromHtmlFile(ByVal sHtmlFile As String, ByVal sTableName As String)

    Dim oConnHtmlImport As ADODB.Connection
    Set oConnHtmlImport = New ADODB.Connection
    oConnHtmlImport.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
           "Data Source=" & sHtmlFile & _
           ";Extended Properties='HTML Import;HDR=YES'"

    Dim rsHtmlImport As ADODB.Recordset
    Set rsHtmlImport = New ADODB.Recordset

    '*
    '* the SQL-table-name is the same as the caption for the HTML table
    '* and not the HTML Title element as written in www.connectionstrings.com
    '*
    '* as with open for Excel I needed to use adOpenStatic
    '*
    rsHtmlImport.Open "SELECT * From [" & sTableName & "]", oConnHtmlImport, CursorTypeEnum.adOpenStatic '* can use CursorTypeEnum.adOpenKeyset
    Debug.Assert rsHtmlImport.RecordCount > 0

    '*
    '* do some work with the recordset here
    '*
    DumpRecordset rsHtmlImport
    'Stop

    '*
    '* when finished close the connection to stop file locks
    '*
    oConnHtmlImport.Close
    Set rsHtmlImport.ActiveConnection = Nothing
    Set oConnHtmlImport = Nothing
    Set rsHtmlImport = Nothing
    'Stop

End Sub

Private Sub TestAggregateHtmlFiles()
    Dim sTransactionFiles(0 To 1) As String
    sTransactionFiles(0) = msCurrenciesHtmlFile
    sTransactionFiles(1) = msTeamsHtmlFile
    
    
    Dim sNewMasterHtml As String
    sNewMasterHtml = AggregateHtmlFiles(msColorsHtmlFile, sTransactionFiles)
    Debug.Print sNewMasterHtml
    
    Dim lFile As Long
    lFile = FreeFile()

    If Dir(msAggregatedHtmlFile) <> "" Then Kill msAggregatedHtmlFile

    Open msAggregatedHtmlFile For Output As #lFile
    Print #lFile, sNewMasterHtml
    Close #lFile
    
End Sub

Private Function AggregateHtmlFiles(ByVal sMasterFile As String, ByRef sTransactionFiles() As String) As String
    '*
    '* check files exists firstly
    '*
    CheckHtmlFileExists sMasterFile
    Dim lTransactionFileLoop As Long
    For lTransactionFileLoop = LBound(sTransactionFiles) To UBound(sTransactionFiles)
        Dim sTransactionFile As String
        sTransactionFile = sTransactionFiles(lTransactionFileLoop)

        CheckHtmlFileExists sTransactionFile
    Next

    Dim oHtml4 As MSHTML.IHTMLDocument4
    Set oHtml4 = New MSHTML.HTMLDocument

    Dim htmlMaster As MSHTML.HTMLDocument
    Set htmlMaster = oHtml4.createDocumentFromUrl(sMasterFile, "")



    While htmlMaster.readyState <> "complete": DoEvents: Wend
    Dim objMasterTable As HTMLTable, objMasterTableList As Object
    Set objMasterTableList = htmlMaster.querySelectorAll("table > caption")
    Set objMasterTable = objMasterTableList.Item(objMasterTableList.Length - 1).parentElement
    'objMasterTable.parentElement

    Dim oMasterBody As MSHTML.HTMLBody
    Set oMasterBody = objMasterTable.parentElement

    For lTransactionFileLoop = LBound(sTransactionFiles) To UBound(sTransactionFiles)

        sTransactionFile = sTransactionFiles(lTransactionFileLoop)

        '*
        Dim htmlTransactionFile As MSHTML.HTMLDocument
        Set htmlTransactionFile = oHtml4.createDocumentFromUrl(sTransactionFile, "")
        While htmlTransactionFile.readyState <> "complete": DoEvents: Wend

        '* get captioned table
        Dim objTransactionTableCaption As HTMLTable
        Set objTransactionTableCaption = htmlTransactionFile.querySelector("table > caption")


        If Not objTransactionTableCaption Is Nothing Then
            '*
            Dim objTransactionTable As HTMLTable
            Set objTransactionTable = objTransactionTableCaption.parentElement


            '* write and add the table element
            Dim objNewTable As HTMLTable
            Set objNewTable = htmlMaster.createElement("TABLE")
            objNewTable.setAttribute "border", "1"
            oMasterBody.appendChild objNewTable


            '* write and add the table caption element
            Dim objNewCaption As HTMLTableCaption
            Set objNewCaption = htmlMaster.createElement("CAPTION")
            objNewTable.appendChild objNewCaption
            objNewCaption.innerText = objTransactionTableCaption.innerText

            '* write the column headers
            Dim objTransTableHeaderRow As Object
            Set objTransTableHeaderRow = objTransactionTable.querySelectorAll("tr > th").Item(0).parentElement

            Dim objNewTableRow As Object
            Set objNewTableRow = htmlMaster.createElement("TR")
            objNewTable.appendChild objNewTableRow

            Dim lColumnCount As Long, lColumnLoop As Long
            lColumnCount = objTransTableHeaderRow.ChildNodes.Length

            For lColumnLoop = 0 To lColumnCount - 1
                Dim objNewTH As Object, objTransTH As Object
                Set objTransTH = objTransTableHeaderRow.ChildNodes.Item(lColumnLoop)
                Set objNewTH = htmlMaster.createElement("TH")
                objNewTH.innerText = objTransTH.innerText
                objNewTableRow.appendChild objNewTH
            Next

            Dim objTransTableDataRow As Object
            Set objTransTableDataRow = objTransTableHeaderRow.NextSibling
            While Not objTransTableDataRow Is Nothing
                
                Set objNewTableRow = htmlMaster.createElement("TR")
                objNewTable.appendChild objNewTableRow
                
                For lColumnLoop = 0 To lColumnCount - 1
                    Dim objNewTD As Object, objTransTD As Object
                    Set objTransTD = objTransTableDataRow.ChildNodes.Item(lColumnLoop)
                    Set objNewTD = htmlMaster.createElement("TD")
                    objNewTD.innerText = objTransTD.innerText
                    objNewTableRow.appendChild objNewTD
                Next

                Set objTransTableDataRow = objTransTableDataRow.NextSibling
            Wend

        End If

    Next lTransactionFileLoop

    AggregateHtmlFiles = htmlMaster.DocumentElement.outerHTML

End Function

Private Sub CheckHtmlFileExists(ByVal sHtmlFile As String)
    If Dir(sHtmlFile) = "" Then Err.Raise vbObjectError, , "#File '" & sHtmlFile & "' does not exist!"
End Sub

Private Sub TestReadCatalogOfHtmlFile()
    ReadCatalogOfHtmlFile msColorsHtmlFile
    ReadCatalogOfHtmlFile msCurrenciesHtmlFile
End Sub

Private Sub ReadCatalogOfHtmlFile(ByVal sColorsHtmlFile As String)
    Dim oConnHtmlImport As ADODB.Connection
    Set oConnHtmlImport = New ADODB.Connection
    oConnHtmlImport.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
           "Data Source=" & sColorsHtmlFile & _
           ";Extended Properties='HTML Import;HDR=YES'"

    Dim catDB As ADOX.Catalog
    Dim tblList As ADOX.Table

    Set catDB = New ADOX.Catalog
    Set catDB.ActiveConnection = oConnHtmlImport

    Dim adoxTables As ADOX.Tables
    Set adoxTables = catDB.Tables

    Dim adoxTableLoop As ADOX.Table
    For Each adoxTableLoop In adoxTables
        Debug.Print adoxTableLoop.Name
    Next adoxTableLoop

End Sub

Private Sub DumpRecordset(ByVal rs As ADODB.Recordset)
    '*
    '* Some code to iterate over the recordset
    '*
    rs.MoveFirst

    Dim lFieldCount As Long
    lFieldCount = rs.Fields.Count

    While Not rs.EOF

        Dim sOutputLine As String
        sOutputLine = ""

        Dim sFieldAndValue As String
        Dim lFieldLoop As Long
        For lFieldLoop = 0 To lFieldCount - 1
            sFieldAndValue = rs.Fields.Item(lFieldLoop).Name & ":" & rs.Fields.Item(lFieldLoop).Value

            sOutputLine = sOutputLine & VBA.IIf(Len(sOutputLine) > 0, vbTab, "") & sFieldAndValue
        Next
        Debug.Print sOutputLine
        rs.MoveNext
    Wend

End Sub

Private Sub UnitTestParseFileName()

    Const sExportFile As String = "N:Colors.html"
    Dim sFileNameOnly As String, sFolderOnly As String

    Debug.Assert ParseFileName(sExportFile, sFileNameOnly, sFolderOnly)
    Debug.Assert sFileNameOnly = "Colors.html"
    Debug.Assert sFolderOnly = "N:"

    Debug.Assert ParseFileName("N:folder1folder2Colors.html", sFileNameOnly, sFolderOnly)
    Debug.Assert sFileNameOnly = "Colors.html"
    Debug.Assert sFolderOnly = "N:folder1folder2"

End Sub

Private Function ParseFileName(ByVal sFullFileName As String, ByRef psFileNameOnly As String, ByRef psFolderOnly As String) As Boolean

    Dim vSplit As Variant
    vSplit = VBA.Split(sFullFileName, "")

    Dim lUBound As Long
    lUBound = UBound(vSplit)

    If lUBound > 0 Then
        psFileNameOnly = vSplit(lUBound)
        psFolderOnly = Left(sFullFileName, Len(sFullFileName) - Len(psFileNameOnly))
        ParseFileName = True
    End If

End Function

Sample Output

Sample Export Html Source

<HTML DIR=LTR>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<TITLE>Colors</TITLE>
</HEAD>
<BODY>
<TABLE DIR=LTR BORDER>
<CAPTION>Colors</CAPTION>
<TR>
<TH>Color</TH>
<TH>RGB</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>Red</TD>
<TD DIR=LTR ALIGN=LEFT>FF0000</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Green</TD>
<TD DIR=LTR ALIGN=LEFT>00FF00</TD>
</TR>
</TABLE>
</BODY>
</HTML>

Sample Export Html Rendered

Colors
Color RGB
Red FF0000
Green 00FF00

Sample Export of Multiple Tables Html Source

<HTML DIR=LTR>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<TITLE>cannot be empty</TITLE>
</HEAD>
<BODY>

<TABLE>
<TR><TD>

<TABLE DIR=LTR BORDER>
<CAPTION>Colors</CAPTION>
<TR>
<TH>Color</TH>
<TH>RGB</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>Red</TD>
<TD DIR=LTR ALIGN=LEFT>&FF0000</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Green</TD>
<TD DIR=LTR ALIGN=LEFT>&00FF00</TD>
</TR>
</TABLE>

</TD><TD>

<TABLE DIR=LTR BORDER>
<CAPTION>Currencies</CAPTION>
<TR>
<TH>Country</TH>
<TH>Ccy</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>France</TD>
<TD DIR=LTR ALIGN=LEFT>Euro</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Japan</TD>
<TD DIR=LTR ALIGN=LEFT>Yen</TD>
</TR>
</TABLE>
</TR>
</TABLE>
</BODY>
</HTML>

Sample Export of Multiple Tables Html Rendered

Colors
Color RGB
Red &FF0000
Green &00FF00
Currencies
Country Ccy
France Euro
Japan Yen

No comments:

Post a Comment