Thursday 31 October 2019

VBA - Xml - Pretty printing Xml to the worksheet

Previously, I have given a VBA routine to pretty print an Xml Dom to a file which gives line breaks. In this post, I give code to write the Xml file as report to the worksheet. I even given code to color in the document to highlight the Xml syntax.

Warning: The coloring code is not very fast, I think the Excel.Range.Characters() method which allows the subselection and formatting of portions of text within a cell is the performance bottlenext (this is Microsoft code and not my code, I promise). So, I have shipped an extra Boolean parameter, bCancelColoring, to switch off the colorisation.

The code requires two library references so go to menu and take Tools->References to get the References dialog box and select (1) Microsoft Scripting Runtime and (2) Microsoft Xml, v6.0

It is intended that the code below be hosted in one single module called modXmlReports. I will be publishing code later that calls this module.

The colorisation was an interesting problem, it required walking the Dom and associating an element with a row number. This meant when parsing the Xml, I could query the Dom for the element's name. This meant I could skip any regular expression parsing code which was welcome.

The screenshot below is (an edited version) of a Microsoft sample xml file, Books.xml. Inevitably, I save my test files in a different location to you, dear reader, so please amend the file path in the test routine. For workspace, the code also uses a subfolder of the system's temp folder called 'modXmlReports'.

I had previously writetn the code as opening the xml text file as a workbook because Excel nicely parses the tabs and prints the xml indents onto the worksheet for us. I have switched to reading the xml text file manually and counting the tabs myself. This led to more stable code; I prefer not to keep opening and closing workbooks.

That's about it. This is meant for developers rather than end users. I have been working with mpeg files and I rewrote my mpeg file parser but I wanted a file to view, so I choose to parse the Mpeg file into an Xml file. So I am a client of the module below. I'll publish that Mpeg code soon.

So here is the code

modXmlReports

Option Explicit

'***********************************************************************************
'* Module Name: modXmlReports
'* Copyright exceldevelopmentplatform.blogspot.com 2nd November 2019
'*
'* Requires Tools Reference
'*  Microsoft Scripting Runtime
'*  Microsoft XML, v6.0
'*
'* Description:
'*  This module will pretty print an Xml file i.e. it will place each element on at least a separate line
'*  and will tab indent to show the level of recursive depth.  Also it will read the prettified Xml file
'*  and write onto a worksheet.  It will even color the Xml syntax to make the contents clearer though this
'*  colorisation code can be slow and so is cancellable
'*
'***********************************************************************************

'************************************************************
'* test routine
'************************************************************
Private Sub TestPrettyReportXml()

    Dim sXmlInputFile As String
    sXmlInputFile = "N:\Dash\Books.xml"   '*<----- this will differ for you, dear reader!

    PrettyReportXml ThisWorkbook, sXmlInputFile, False, ""
End Sub



'*********************************************************************************************************
'* Name:        PrettyReportXml
'* Description: Sole entry point for this module
'*********************************************************************************************************

Public Function PrettyReportXml(ByVal wbReports As Excel.Workbook, _
                ByVal sXmlInputFile As String, _
                ByVal bCancelColoring As Boolean, _
                ByVal sWorkFolder As String) As Boolean

    Debug.Assert (Not wbReports Is Nothing) And (Len(sXmlInputFile) > 0)
    
    '* this is a three step process:
    '* (1) first load the Xml.txt file into a workshete and copy in
    '* (2) associate each line with an element of the dom, taking into account open and close on different lines
    '* (3) color in the text
    
    Dim xmlDoc As MSXML2.DOMDocument60
    Set xmlDoc = New MSXML2.DOMDocument60
    
    If Not xmlDoc.Load(sXmlInputFile) Then
        Debug.Print "Xml file '" & sXmlInputFile & "' does not parse!  Aborting."
    Else
    
        Dim sXmlPrettifiedFileName As String
        sXmlPrettifiedFileName = ReportFileName(sXmlInputFile, sWorkFolder)
        
        Dim wsReturn As Excel.Worksheet
        Set wsReturn = OpenAndCopyXmlReportIntoWorkbook(wbReports, xmlDoc, sXmlPrettifiedFileName)
        
        Dim dicLinesAndElements As Scripting.Dictionary, dicColors As Scripting.Dictionary
        
        If Not bCancelColoring Then

            Set dicLinesAndElements = AssociateLineNumbersWithElements(wsReturn, xmlDoc)
            Set dicColors = New Scripting.Dictionary

            dicColors.Add "Punctuation", VBA.Information.RGB(0, 0, 255)  '* feel free to change these colors
            dicColors.Add "XmlToken", VBA.Information.RGB(153, 0, 0)
            dicColors.Add "Content", VBA.Information.RGB(0, 0, 0)

            ColorInText wsReturn, dicLinesAndElements, dicColors

            CleanUpLinesAndElements dicLinesAndElements
        End If
        
        '* force tidy up
        Set dicLinesAndElements = Nothing
        Set dicColors = Nothing
        Set wsReturn = Nothing
        Set xmlDoc = Nothing
        Set wbReports = Nothing
        sXmlInputFile = vbNullString
        sXmlPrettifiedFileName = vbNullString
        '* end of tidy up
        
        PrettyReportXml = True
    End If
End Function

'************************************************************
'* function which govern the working file location
'************************************************************

Private Function LeafName(ByVal sInputFile As String) As String
    Dim fso As Scripting.FileSystemObject
    Set fso = New Scripting.FileSystemObject

    Dim filMpeg As Scripting.File
    Set filMpeg = fso.GetFile(sInputFile)
    
    LeafName = filMpeg.name
    Set filMpeg = Nothing
    Set fso = Nothing

End Function

Private Function ReportFileName(ByVal sInputFile As String, ByVal sWorkFolder As String) As String
    Dim fso As Scripting.FileSystemObject
    Set fso = New Scripting.FileSystemObject
    '* we want to work with our file which we will place in same directory as ThisWorkbook

    If sWorkFolder = "" Then sWorkFolder = TempFolder

    ReportFileName = fso.BuildPath(sWorkFolder, LeafName(sInputFile) & ".txt")
    Set fso = Nothing
End Function

Private Function TempFolder() As String
    Dim fso As Scripting.FileSystemObject
    Set fso = New Scripting.FileSystemObject
    
    Const TemporaryFolder As Long = 2
    TempFolder = fso.GetSpecialFolder(TemporaryFolder)
    
    If Not fso.FolderExists(fso.BuildPath(TempFolder, "modXmlReports")) Then
    
        Dim fldTemp As Scripting.Folder
        Set fldTemp = fso.GetFolder(TempFolder)
        fldTemp.SubFolders.Add "modXmlReports"
    
        Debug.Assert fso.FolderExists(fso.BuildPath(TempFolder, "modXmlReports"))
    End If
    TempFolder = fso.BuildPath(TempFolder, "modXmlReports")
    
    Set fso = Nothing
    
End Function

'Private Function ThisWorkbookHomeFolder() As String
'    Dim fso As Scripting.FileSystemObject
'    Set fso = New Scripting.FileSystemObject
'
'    'Debug.Assert ThisWorkbook.Saved = True
'    ThisWorkbookHomeFolder = fso.GetFile(ThisWorkbook.FullName).ParentFolder.ShortPath
'    Set fso = Nothing
'
'    '* alternative
'    'Dim vSplit
'    'vSplit = VBA.Split(ThisWorkbook.FullName, "\")
'    'ReDim Preserve vSplit(0 To UBound(vSplit) - 1)
'    'ThisWorkbookHomeFolder = VBA.Join(vSplit, "\")
'End Function


'************************************************************
'* prettify Xml and load into workbook routines
'************************************************************
Private Function OpenAndCopyXmlReportIntoWorkbook(ByVal wbReports As Excel.Workbook, _
            ByVal xmlDoc As MSXML2.DOMDocument60, ByVal sXmlPrettifiedFileName As String) As Excel.Worksheet

    Debug.Assert (Not wbReports Is Nothing) And (Not xmlDoc Is Nothing) And (Len(sXmlPrettifiedFileName) > 0)

    If StrComp(Right$(sXmlPrettifiedFileName, 4), ".txt", vbTextCompare) <> 0 Then
        Debug.Print "File '" & sXmlPrettifiedFileName & "' must end in .txt so as to bypass Excel's Xml file opening logic.  Aborting."
        GoTo SingleExit
    Else
    
        Dim vSrc As Variant, lLineCount As Long, lColCount  As Long
        vSrc = WriteAndReadPrettifiedReport(xmlDoc, sXmlPrettifiedFileName, lLineCount, lColCount)
    
        Dim wsDest As Excel.Worksheet
        Set wsDest = wbReports.Worksheets.Add(After:=wbReports.Worksheets.Item(wbReports.Worksheets.count))
        
        Dim sSheetName As String
        sSheetName = VBA.Split(LeafName(sXmlPrettifiedFileName), ".")(0)
        If SheetExists(wbReports, sSheetName) Then
            Application.DisplayAlerts = False
            wbReports.Worksheets.Item(sSheetName).Delete
            Application.DisplayAlerts = True
        End If
        Set wbReports = Nothing
        wsDest.name = sSheetName
        wsDest.Cells(1, 1).Resize(lLineCount, lColCount).Value2 = vSrc
        vSrc = Empty
        

        Set OpenAndCopyXmlReportIntoWorkbook = wsDest
        Set wsDest = Nothing
        
        Set wbReports = Nothing
        
        DoEvents '* allow repaint
    End If
SingleExit:

End Function

Private Function WriteAndReadPrettifiedReport(ByVal xmlDoc As MSXML2.DOMDocument60, _
                    ByVal sXmlPrettifiedFileName As String, _
                    ByRef plLineCount As Long, ByRef plColCount As Long)

    Dim fso As Scripting.FileSystemObject
    Set fso = New Scripting.FileSystemObject

    Dim txtOut As Scripting.TextStream
    Set txtOut = fso.CreateTextFile(sXmlPrettifiedFileName)
    txtOut.Write PrettyPrintXml(xmlDoc)
    txtOut.Close
    Set txtOut = Nothing
    Set xmlDoc = Nothing
    
    Dim lMaxTabCount As Long
    lMaxTabCount = 0
    
    Dim dicLines As Scripting.Dictionary
    Set dicLines = New Scripting.Dictionary
    
    Dim txtIn As Scripting.TextStream
    Set txtIn = fso.OpenTextFile(sXmlPrettifiedFileName)
    
    Dim sLine As String
    sLine = txtIn.ReadLine
    
    While Not txtIn.AtEndOfStream
        Dim lTabCount As Long
        lTabCount = CountChars(sLine, vbTab)
        If lTabCount > lMaxTabCount Then lMaxTabCount = lTabCount
        DoEvents
        dicLines.Add dicLines.count, sLine
        sLine = txtIn.ReadLine
    Wend
    dicLines.Add dicLines.count, sLine '* add final line
    txtIn.Close
    Set txtIn = Nothing
    Set fso = Nothing
    
    plLineCount = dicLines.count
    plColCount = lMaxTabCount + 1
    
    ReDim vRet(1 To dicLines.count, 1 To plColCount) As Variant
    
    Dim vLine As Variant, lRow As Long
    For Each vLine In dicLines.Items
        lRow = lRow + 1
        lTabCount = CountChars(vLine, vbTab)
        
        vRet(lRow, lTabCount + 1) = Replace(vLine, vbTab, "")
    
    Next vLine
    
    Set dicLines = Nothing
    
    WriteAndReadPrettifiedReport = vRet

End Function

Private Function CountChars(ByVal s As String, ByVal C As String)
    CountChars = Len(s) - Len(Replace(s, C, ""))
End Function

Private Function SheetExists(ByVal wb As Excel.Workbook, ByVal sSheetName As String) As Boolean
    SheetExists = SheetNames(wb).Exists(sSheetName)
End Function

Private Function SheetNames(ByVal wb As Excel.Workbook) As Scripting.Dictionary
    Dim dic As Scripting.Dictionary
    Set dic = New Scripting.Dictionary
    
    Dim ws As Excel.Worksheet
    For Each ws In wb.Worksheets
        dic.Add ws.name, ""
    Next
    Set SheetNames = dic

End Function

Private Function PrettyPrintXml(ByVal dom As MSXML2.DOMDocument60) As String

    Dim reader As MSXML2.SAXXMLReader60
    Set reader = New MSXML2.SAXXMLReader60
    
    Dim writer As MSXML2.MXXMLWriter60
    Set writer = New MSXML2.MXXMLWriter60
    writer.omitXMLDeclaration = True
    writer.indent = True

    reader.putProperty "http://xml.org/sax/properties/lexical-handler", writer
    
    Set reader.contentHandler = writer
    reader.Parse dom.XML
   
    PrettyPrintXml = writer.output
    Set reader = Nothing
    Set writer = Nothing
    
End Function

'************************************************************
'* walking the dom to associate elements with lines routines
'************************************************************
Private Function AssociateLineNumbersWithElements(ByVal wsReturn As Excel.Worksheet, _
                ByVal xmlDoc As MSXML2.DOMDocument60) As Scripting.Dictionary
    '* thankfully each line will have its own element
    '* so we can now assocaite an element with a line and
    '* leverage the Xml parser instead of doing regular expressions
    Dim dicLinesAndElements As Scripting.Dictionary
    Set dicLinesAndElements = New Scripting.Dictionary

    WalkDomDoc xmlDoc.DocumentElement, dicLinesAndElements

    Set AssociateLineNumbersWithElements = dicLinesAndElements
End Function

Private Sub CleanUpLinesAndElements(ByVal dicLinesAndElements As Scripting.Dictionary)

    Dim vKey
    For Each vKey In dicLinesAndElements
        Set dicLinesAndElements.Item(vKey) = Nothing
        dicLinesAndElements.Remove vKey
    Next
End Sub

Private Sub WalkDomDoc(ByVal xmlElem As MSXML2.IXMLDOMElement, ByVal dicLinesAndElements As Scripting.Dictionary)

    Dim bHasChildren As Boolean
    bHasChildren = False

    Dim lMyLine As Long
    lMyLine = dicLinesAndElements.count + 1
    dicLinesAndElements.Add lMyLine, Array("Both", xmlElem)

    Dim xmlNodeLoop As MSXML2.IXMLDOMNode
    For Each xmlNodeLoop In xmlElem.ChildNodes
        If xmlNodeLoop.NodeType = NODE_ELEMENT Then
            If Not bHasChildren Then
                bHasChildren = True
                '* replace
                dicLinesAndElements.Item(lMyLine) = Array("Open", xmlElem)
            End If

            WalkDomDoc xmlNodeLoop, dicLinesAndElements
        End If
    Next xmlNodeLoop

    If bHasChildren Then
        dicLinesAndElements.Add dicLinesAndElements.count + 1, Array("Close", xmlElem)
    End If
End Sub

'**************************************
'* the color routines
'**************************************
Private Sub ColorInText(ByVal wsReturn As Excel.Worksheet, ByVal dicLinesAndElements As Scripting.Dictionary, ByVal dicColors As Scripting.Dictionary)
    '* ASSUMPTION: thanks to the PrettyPrintXml() function beneath each element node is on its own line.

    Debug.Assert Not wsReturn Is Nothing

    Dim lRowLoop As Long
    For lRowLoop = 1 To wsReturn.Cells(1, 1).CurrentRegion.Rows.count

        Dim bFoundXmlCell As Boolean
        bFoundXmlCell = False

        Dim rngXmlCell As Excel.Range

        Dim lColumnLoop As Long
        For lColumnLoop = 1 To wsReturn.Cells(1, 1).CurrentRegion.Columns.count
            Set rngXmlCell = wsReturn.Cells(lRowLoop, lColumnLoop)
            If Len(rngXmlCell) > 0 Then
                bFoundXmlCell = True
                Exit For
            End If
        Next

        If bFoundXmlCell Then
            ColorXml rngXmlCell, dicLinesAndElements.Item(lRowLoop), dicColors
        End If
    Next
End Sub

Private Sub ColorXml(ByVal rngXmlCell As Excel.Range, ByVal vTagDetails, ByVal dicColors As Scripting.Dictionary)
    Dim vOpenOrClose
    vOpenOrClose = vTagDetails(0)

    Dim xmlElem As MSXML2.IXMLDOMElement
    Set xmlElem = vTagDetails(1)

    Dim lNodeNameLen As Long
    lNodeNameLen = Len(xmlElem.nodeName)

    If vOpenOrClose = "Close" Or vOpenOrClose = "Both" Then
        ColorCloseTagToken rngXmlCell, lNodeNameLen, dicColors, xmlElem, vOpenOrClose
    End If

    If vOpenOrClose = "Open" Or vOpenOrClose = "Both" Then
        ColorOpenTagToken rngXmlCell, lNodeNameLen, dicColors
        ColorAttributes rngXmlCell, lNodeNameLen, dicColors, xmlElem
    End If
End Sub

Private Sub ColorAttributes(ByVal rngXmlCell As Excel.Range, ByVal lNodeNameLen As Long, _
                        ByVal dicColors As Scripting.Dictionary, ByVal xmlElem As MSXML2.IXMLDOMElement)
    If xmlElem.Attributes.Length > 0 Then
        Dim sXml As String
        sXml = rngXmlCell.Value2

        Dim lQuotes As Long
        lQuotes = 1

        Dim xmlAttrLoop As MSXML2.IXMLDOMAttribute
        For Each xmlAttrLoop In xmlElem.Attributes

            Dim lFindId As Long
            lFindId = InStr(lQuotes, sXml, xmlAttrLoop.name, vbTextCompare)

            rngXmlCell.Characters(Start:=lFindId, Length:=Len(xmlAttrLoop.name)).Font.Color = dicColors.Item("XmlToken")

            Dim lEquals As Long
            lEquals = InStr(lFindId + 1, sXml, "=", vbTextCompare)

            rngXmlCell.Characters(Start:=lEquals, Length:=1).Font.Color = dicColors.Item("Punctuation")


            lQuotes = InStr(lEquals + 1, sXml, """", vbTextCompare)  '* assume double quotes, TODO make single quote aware

            rngXmlCell.Characters(Start:=lQuotes, Length:=1).Font.Color = dicColors.Item("Punctuation")

            lQuotes = InStr(lQuotes + 1, sXml, """", vbTextCompare)  '* assume double quotes, TODO make single quote aware
            rngXmlCell.Characters(Start:=lQuotes, Length:=1).Font.Color = dicColors.Item("Punctuation")
        Next
    End If
End Sub

Private Sub ColorCloseTagToken(ByVal rngXmlCell As Excel.Range, ByVal lNodeNameLen As Long, _
                        ByVal dicColors As Scripting.Dictionary, ByVal xmlElem As MSXML2.IXMLDOMElement, ByVal vOpenOrClose)

    Dim lStart As Long
    lStart = VBA.IIf(vOpenOrClose = "Both", InStr(1, rngXmlCell, Chr$(60) & "/" & xmlElem.nodeName, vbTextCompare), 1)

    rngXmlCell.Characters(Start:=lStart, Length:=2).Font.Color = dicColors.Item("Punctuation")
    rngXmlCell.Characters(Start:=lStart + 2, Length:=lNodeNameLen).Font.Color = dicColors.Item("XmlToken")
    rngXmlCell.Characters(Start:=lStart + 2 + lNodeNameLen, Length:=1).Font.Color = dicColors.Item("Punctuation")

End Sub

Private Sub ColorOpenTagToken(ByVal rngXmlCell As Excel.Range, ByVal lNodeNameLen As Long, ByVal dicColors As Scripting.Dictionary)

    rngXmlCell.Characters(Start:=1, Length:=1).Font.Color = dicColors.Item("Punctuation")
    rngXmlCell.Characters(Start:=2, Length:=lNodeNameLen).Font.Color = dicColors.Item("XmlToken")
    Dim lRightAngleBracket As Long
    lRightAngleBracket = InStr(1, rngXmlCell.Value2, Chr$(62), vbBinaryCompare)
    Debug.Assert lRightAngleBracket > 0

    rngXmlCell.Characters(Start:=lRightAngleBracket, Length:=1).Font.Color = dicColors.Item("Punctuation")

End Sub