This post follows on from the other day's mega-post on Microsoft.ACE.OLEDB.12.0. As part of the investigation into the Excel engine, I discovered how to use the Html Export and Html Import drivers (yes they are separate).
Initially I was excited to find another way to store and retrieve data without requiring a database. After writing some code my enthusiasm has waned and my technical recommendation is that you skip this technology.
Weaknesses with the HTML Import Export Drivers
So, I ought to itemise my concerns here...
Type Inference
So the Excel driver has to sample some rows to guess the type of column (I have yet to find a way to declare the column type) and the Html Import driver does so equally. I came up against this more so with the Html Import driver, I'm guessing it samples fewer rows, anyway I had to change the text in the sample data (compared to the mega post) and give the hex values leading ampersands to enforce its inference as being a string. E.g. I needed to replace 00FF00 to &00FF00 in SetUpSomeData(). I have discovered that ordinary text files can have schema.ini files attached, which would put text files streets ahead of Html files because it obviates the need for type inference.
Html Export is not well-formed Xml
I'd like to nominate Html for a troublesome technology award. Html is in same family as Xml but typically cannot be parsed with Xml parsers because it is not well-formed. The latest version of Html, Html 5, is well-formed so going forward things ought to be better. This driver Microsoft.ACE.OLEDB.12.0 is quite new and ought to be output well-formed Html but it doesn't. This means having to write some large amounts of paring code (see AggregateHtmlFiles() in sample code). I could write that code ten times more concisely with Xml library!
Missed opportunity to export multiple tables to one single file
So I myself had to write code to aggregate tables to one single file. It turned out to be more painful that I imagined. The code below shows HTML parsing logic (see AggregateHtmlFiles() in sample code). If I were to rewrite this I'd scrap that approach and start with an Xml representation of cell block and XSLT transform that into the correct Html. This is a shame, for a while I thought multi-table Html file could make a nice config file.
Sample Code
Anyway, I wrote plenty of experimental code. Instead of throwing away this code, I will deposit here. There is an example of exporting from a sheet to an Html file. There is an example of reading an Html file.
Also, because the export only allows one table per Html file I have written some code to aggregate separate single table files into a multi-table file, see AggregateHtmlFiles(). I did this with the same libraries that Internet Explorer uses and so had to workaround IE bugs, code would have been much simple if the InsertAdjacentHTML method wasn't buggy.
Option Explicit
Option Private Module
'* Tools -> References
'* ADODB Microsoft ActiveX Data Objects 6.1 Library C:\Program Files (x86)\Common Files\System\ado\msado15.dll
'* ADOX Microsoft ADO Ext. 6.0 for DDL and Security C:\Program Files (x86)\Common Files\System\ado\msadox.dll
Private Const msColorsHtmlFile As String = "N:Colors.html" '<---- change this to your working file location
Private Const msCurrenciesHtmlFile As String = "N:Currencies.html" '<---- change this to your working file location
Private Const msTeamsHtmlFile As String = "N:Teams.html" '<---- change this to your working file location
Private Const msAggregatedHtmlFile As String = "N:Aggregated.html" '<---- change this to your working file location
Private Sub SetUpSomeData()
'* WARNING this will wipe data!
Dim sht As Excel.Worksheet
Set sht = ThisWorkbook.Worksheets.Item("Sheet1")
sht.Cells.Clear
'*
'* use our array literal trick, for more tricks tips and 'blue sky thinking'
'* see http://exceldevelopmentplatform.blogspot.com
'*
Dim vColors As Variant
vColors = [{"Color","RGB";"Red","&FF0000";"Green","&00FF00"}] '* note the addition of the ampersand for type inference
sht.Range("A1:B3").Value2 = vColors
Dim vCurrencies As Variant
vCurrencies = [{"Country","Ccy";"France","Euro";"Japan","Yen"}]
sht.Range("D1:E3").Value2 = vCurrencies
Dim vTeams As Variant
vTeams = [{"Team","Country";"New York Red Bulls","US";"Spartak Moskva","Russia";"Man Utd","England";"Barcelona","Spain";"Bayern Munich","Germany"}]
sht.Range("G1:H6").Value2 = vTeams
End Sub
Private Sub TestWriteToHtmlFile()
WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("A1").CurrentRegion, msColorsHtmlFile
WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("D1").CurrentRegion, msCurrenciesHtmlFile
WriteToHtmlFile ThisWorkbook.Worksheets.Item("Sheet1").Range("G1").CurrentRegion, msTeamsHtmlFile
End Sub
Private Sub WriteToHtmlFile(ByVal rngTable As Excel.Range, ByVal sColorsHtmlFile As String)
Dim sTableAddress As String
sTableAddress = "[" & rngTable.Worksheet.Name & "$" & rngTable.Address(False, False, xlA1) & "] """
Dim oConnExcel As ADODB.Connection
Set oConnExcel = New ADODB.Connection
If UBound(Split(ThisWorkbook.Name, ".")) = 0 Then Err.Raise vbObjectError, , "#Workbook needs a file extension, i.e. saved at least once!"
'*
'* we're reading from worksheet so we need the Excel engine
'*
oConnExcel.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & ThisWorkbook.FullName & ";" & _
"Extended Properties='Excel 12.0 Macro;HDR=YES'"
Dim sFileNameOnly As String, sFolderOnly As String
If ParseFileName(sColorsHtmlFile, sFileNameOnly, sFolderOnly) Then
If Not Dir(sColorsHtmlFile) = "" Then Kill sColorsHtmlFile '* if it exists then delete before re-exporting
Dim sCmdText As String
sCmdText = VBA.Replace("SELECT * INTO [%filename%] in %quotedFolder% ""HTML Export;"" FROM %tableName%", "%filename%", sFileNameOnly)
sCmdText = VBA.Replace(sCmdText, "%quotedFolder%", """" & sFolderOnly & """")
sCmdText = VBA.Replace(sCmdText, "%tableName%", sTableAddress)
'*
'* HTML Export is specified in the command text, no need for a separate connection
'*
Debug.Print sCmdText
oConnExcel.Execute sCmdText
End If
End Sub
Private Sub TestReadFromHtmlFile()
ReadFromHtmlFile msColorsHtmlFile, "Colors"
ReadFromHtmlFile msCurrenciesHtmlFile, "Currencies"
End Sub
Private Sub ReadFromHtmlFile(ByVal sHtmlFile As String, ByVal sTableName As String)
Dim oConnHtmlImport As ADODB.Connection
Set oConnHtmlImport = New ADODB.Connection
oConnHtmlImport.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & sHtmlFile & _
";Extended Properties='HTML Import;HDR=YES'"
Dim rsHtmlImport As ADODB.Recordset
Set rsHtmlImport = New ADODB.Recordset
'*
'* the SQL-table-name is the same as the caption for the HTML table
'* and not the HTML Title element as written in www.connectionstrings.com
'*
'* as with open for Excel I needed to use adOpenStatic
'*
rsHtmlImport.Open "SELECT * From [" & sTableName & "]", oConnHtmlImport, CursorTypeEnum.adOpenStatic '* can use CursorTypeEnum.adOpenKeyset
Debug.Assert rsHtmlImport.RecordCount > 0
'*
'* do some work with the recordset here
'*
DumpRecordset rsHtmlImport
'Stop
'*
'* when finished close the connection to stop file locks
'*
oConnHtmlImport.Close
Set rsHtmlImport.ActiveConnection = Nothing
Set oConnHtmlImport = Nothing
Set rsHtmlImport = Nothing
'Stop
End Sub
Private Sub TestAggregateHtmlFiles()
Dim sTransactionFiles(0 To 1) As String
sTransactionFiles(0) = msCurrenciesHtmlFile
sTransactionFiles(1) = msTeamsHtmlFile
Dim sNewMasterHtml As String
sNewMasterHtml = AggregateHtmlFiles(msColorsHtmlFile, sTransactionFiles)
Debug.Print sNewMasterHtml
Dim lFile As Long
lFile = FreeFile()
If Dir(msAggregatedHtmlFile) <> "" Then Kill msAggregatedHtmlFile
Open msAggregatedHtmlFile For Output As #lFile
Print #lFile, sNewMasterHtml
Close #lFile
End Sub
Private Function AggregateHtmlFiles(ByVal sMasterFile As String, ByRef sTransactionFiles() As String) As String
'*
'* check files exists firstly
'*
CheckHtmlFileExists sMasterFile
Dim lTransactionFileLoop As Long
For lTransactionFileLoop = LBound(sTransactionFiles) To UBound(sTransactionFiles)
Dim sTransactionFile As String
sTransactionFile = sTransactionFiles(lTransactionFileLoop)
CheckHtmlFileExists sTransactionFile
Next
Dim oHtml4 As MSHTML.IHTMLDocument4
Set oHtml4 = New MSHTML.HTMLDocument
Dim htmlMaster As MSHTML.HTMLDocument
Set htmlMaster = oHtml4.createDocumentFromUrl(sMasterFile, "")
While htmlMaster.readyState <> "complete": DoEvents: Wend
Dim objMasterTable As HTMLTable, objMasterTableList As Object
Set objMasterTableList = htmlMaster.querySelectorAll("table > caption")
Set objMasterTable = objMasterTableList.Item(objMasterTableList.Length - 1).parentElement
'objMasterTable.parentElement
Dim oMasterBody As MSHTML.HTMLBody
Set oMasterBody = objMasterTable.parentElement
For lTransactionFileLoop = LBound(sTransactionFiles) To UBound(sTransactionFiles)
sTransactionFile = sTransactionFiles(lTransactionFileLoop)
'*
Dim htmlTransactionFile As MSHTML.HTMLDocument
Set htmlTransactionFile = oHtml4.createDocumentFromUrl(sTransactionFile, "")
While htmlTransactionFile.readyState <> "complete": DoEvents: Wend
'* get captioned table
Dim objTransactionTableCaption As HTMLTable
Set objTransactionTableCaption = htmlTransactionFile.querySelector("table > caption")
If Not objTransactionTableCaption Is Nothing Then
'*
Dim objTransactionTable As HTMLTable
Set objTransactionTable = objTransactionTableCaption.parentElement
'* write and add the table element
Dim objNewTable As HTMLTable
Set objNewTable = htmlMaster.createElement("TABLE")
objNewTable.setAttribute "border", "1"
oMasterBody.appendChild objNewTable
'* write and add the table caption element
Dim objNewCaption As HTMLTableCaption
Set objNewCaption = htmlMaster.createElement("CAPTION")
objNewTable.appendChild objNewCaption
objNewCaption.innerText = objTransactionTableCaption.innerText
'* write the column headers
Dim objTransTableHeaderRow As Object
Set objTransTableHeaderRow = objTransactionTable.querySelectorAll("tr > th").Item(0).parentElement
Dim objNewTableRow As Object
Set objNewTableRow = htmlMaster.createElement("TR")
objNewTable.appendChild objNewTableRow
Dim lColumnCount As Long, lColumnLoop As Long
lColumnCount = objTransTableHeaderRow.ChildNodes.Length
For lColumnLoop = 0 To lColumnCount - 1
Dim objNewTH As Object, objTransTH As Object
Set objTransTH = objTransTableHeaderRow.ChildNodes.Item(lColumnLoop)
Set objNewTH = htmlMaster.createElement("TH")
objNewTH.innerText = objTransTH.innerText
objNewTableRow.appendChild objNewTH
Next
Dim objTransTableDataRow As Object
Set objTransTableDataRow = objTransTableHeaderRow.NextSibling
While Not objTransTableDataRow Is Nothing
Set objNewTableRow = htmlMaster.createElement("TR")
objNewTable.appendChild objNewTableRow
For lColumnLoop = 0 To lColumnCount - 1
Dim objNewTD As Object, objTransTD As Object
Set objTransTD = objTransTableDataRow.ChildNodes.Item(lColumnLoop)
Set objNewTD = htmlMaster.createElement("TD")
objNewTD.innerText = objTransTD.innerText
objNewTableRow.appendChild objNewTD
Next
Set objTransTableDataRow = objTransTableDataRow.NextSibling
Wend
End If
Next lTransactionFileLoop
AggregateHtmlFiles = htmlMaster.DocumentElement.outerHTML
End Function
Private Sub CheckHtmlFileExists(ByVal sHtmlFile As String)
If Dir(sHtmlFile) = "" Then Err.Raise vbObjectError, , "#File '" & sHtmlFile & "' does not exist!"
End Sub
Private Sub TestReadCatalogOfHtmlFile()
ReadCatalogOfHtmlFile msColorsHtmlFile
ReadCatalogOfHtmlFile msCurrenciesHtmlFile
End Sub
Private Sub ReadCatalogOfHtmlFile(ByVal sColorsHtmlFile As String)
Dim oConnHtmlImport As ADODB.Connection
Set oConnHtmlImport = New ADODB.Connection
oConnHtmlImport.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & sColorsHtmlFile & _
";Extended Properties='HTML Import;HDR=YES'"
Dim catDB As ADOX.Catalog
Dim tblList As ADOX.Table
Set catDB = New ADOX.Catalog
Set catDB.ActiveConnection = oConnHtmlImport
Dim adoxTables As ADOX.Tables
Set adoxTables = catDB.Tables
Dim adoxTableLoop As ADOX.Table
For Each adoxTableLoop In adoxTables
Debug.Print adoxTableLoop.Name
Next adoxTableLoop
End Sub
Private Sub DumpRecordset(ByVal rs As ADODB.Recordset)
'*
'* Some code to iterate over the recordset
'*
rs.MoveFirst
Dim lFieldCount As Long
lFieldCount = rs.Fields.Count
While Not rs.EOF
Dim sOutputLine As String
sOutputLine = ""
Dim sFieldAndValue As String
Dim lFieldLoop As Long
For lFieldLoop = 0 To lFieldCount - 1
sFieldAndValue = rs.Fields.Item(lFieldLoop).Name & ":" & rs.Fields.Item(lFieldLoop).Value
sOutputLine = sOutputLine & VBA.IIf(Len(sOutputLine) > 0, vbTab, "") & sFieldAndValue
Next
Debug.Print sOutputLine
rs.MoveNext
Wend
End Sub
Private Sub UnitTestParseFileName()
Const sExportFile As String = "N:Colors.html"
Dim sFileNameOnly As String, sFolderOnly As String
Debug.Assert ParseFileName(sExportFile, sFileNameOnly, sFolderOnly)
Debug.Assert sFileNameOnly = "Colors.html"
Debug.Assert sFolderOnly = "N:"
Debug.Assert ParseFileName("N:folder1folder2Colors.html", sFileNameOnly, sFolderOnly)
Debug.Assert sFileNameOnly = "Colors.html"
Debug.Assert sFolderOnly = "N:folder1folder2"
End Sub
Private Function ParseFileName(ByVal sFullFileName As String, ByRef psFileNameOnly As String, ByRef psFolderOnly As String) As Boolean
Dim vSplit As Variant
vSplit = VBA.Split(sFullFileName, "")
Dim lUBound As Long
lUBound = UBound(vSplit)
If lUBound > 0 Then
psFileNameOnly = vSplit(lUBound)
psFolderOnly = Left(sFullFileName, Len(sFullFileName) - Len(psFileNameOnly))
ParseFileName = True
End If
End Function
Sample Output
Sample Export Html Source
<HTML DIR=LTR>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<TITLE>Colors</TITLE>
</HEAD>
<BODY>
<TABLE DIR=LTR BORDER>
<CAPTION>Colors</CAPTION>
<TR>
<TH>Color</TH>
<TH>RGB</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>Red</TD>
<TD DIR=LTR ALIGN=LEFT>FF0000</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Green</TD>
<TD DIR=LTR ALIGN=LEFT>00FF00</TD>
</TR>
</TABLE>
</BODY>
</HTML>
Sample Export Html Rendered
Color | RGB | Red | FF0000 |
---|---|
Green | 00FF00 |
Sample Export of Multiple Tables Html Source
<HTML DIR=LTR>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<TITLE>cannot be empty</TITLE>
</HEAD>
<BODY>
<TABLE>
<TR><TD>
<TABLE DIR=LTR BORDER>
<CAPTION>Colors</CAPTION>
<TR>
<TH>Color</TH>
<TH>RGB</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>Red</TD>
<TD DIR=LTR ALIGN=LEFT>&FF0000</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Green</TD>
<TD DIR=LTR ALIGN=LEFT>&00FF00</TD>
</TR>
</TABLE>
</TD><TD>
<TABLE DIR=LTR BORDER>
<CAPTION>Currencies</CAPTION>
<TR>
<TH>Country</TH>
<TH>Ccy</TH>
</TR>
<TD DIR=LTR ALIGN=LEFT>France</TD>
<TD DIR=LTR ALIGN=LEFT>Euro</TD>
</TR>
<TR>
<TD DIR=LTR ALIGN=LEFT>Japan</TD>
<TD DIR=LTR ALIGN=LEFT>Yen</TD>
</TR>
</TABLE>
</TR>
</TABLE>
</BODY>
</HTML>
Sample Export of Multiple Tables Html Rendered
|
|
No comments:
Post a Comment