Wednesday 25 March 2020

VBA - Persisting Variant Arrays To File

In this post I investigate the undocumented binary representation of variant arrays in files as saved by VBA using the FreeFile, Open For Binary and Put statements. We'll see how each element is preceded by a vbVarType constant followed by a binary representation. This is another attempt to find a good serialization format suited for VBA developers, one that scores higher than XML or JSON because it is native to the VBA/VB6 ecosystem. I believe the undocumented binary persistence format is a good candidate.

Approach

Some Experiments and Investigations

So the following programs are experimental; they write a variable to a binary file for inspection. The first program will write a string array to file and then the reader is invited to inspect the file in Notepad++ or a Hex Editor (I give a screenshot). In the second program I go on to write some code to dump the bytes in hex form to the Immediate Window because I got frustrated with interpreting non printable ASCII characters.

The Goal is VBA as Consumer

I have absolutely no intention of writing VBA code to manually save/load structures to/from a binary files because VBA already supports this. I demonstrate this serialization logic to support writing code in languages other than VBA such as Javascript to generate a file that can be consumed by VBA. The use case scenario could be a web service returning some data to a VBA caller, rather than parse JSON or XML it would be better if VBA loaded the binary file in a variable using the FreeFile, Open For Binary and Get statements. In this way I am shifting the burden of parsing logic to the server-side.

I hope to go on and actually give some JavaScript code that does implement this serialization format in the next post. Because all JavaScript floating point numbers are Double precision (IEEE 754) I'll skip Single as VBA variable type. Also, I will skip the Decimal VBA variable type. Dates are encoded as Doubles so that will be covered.

Investigate arrays first then concentrate on typing

So the first chunk of code will work with strings only to abstract any typology. The program will save a one-dimensional string array; the code then loads the array back into an identically dimensioned one-dimensional string array as a base case. The program then loads the array back into a two dimensional array to investigate if the persistence is row-wise or column-wise.

The second chunk of code then focuses on persisting a single element array, each time with a different variable type to investigate the binary representation of each variable type. I needed to work with one element only here because otherwise the stream of hex bytes is too much to interpret!

The Code

All the code should be placed in a single standard module. You will also need Tools->References->Microsoft Scripting Runtime for some other file handling. The binary files are generated in a sub-directory of the user's temp directory.

First chunk of code saves a string array

So here is our first chunk of code which experiments with array structures. See below for a screenshot of the file in Notepad++.

You should run TestPersistStringTypedArray as the starting point.

Option Explicit

Private Function GetWorkingDir() As String
    Dim fso As New Scripting.FileSystemObject '* Tools->References->Microsoft Scripting Runtime
    Const TemporaryFolder As Long = 2
    
    Dim fldTemp As Scripting.Folder
    Set fldTemp = fso.GetSpecialFolder(TemporaryFolder)

    Dim sPathVBAPersist As String
    sPathVBAPersist = fso.BuildPath(fldTemp, "VBAPersist")

    If Not fso.FolderExists(sPathVBAPersist) Then
        fso.CreateFolder sPathVBAPersist
    End If
    GetWorkingDir = sPathVBAPersist
End Function

Private Function OpenCleanFileForBinary(ByVal sPartialFileName As String, ByRef plFileNum As Long) As String
    
    Dim fso As New Scripting.FileSystemObject '* Tools->References->Microsoft Scripting Runtime
    
    Dim sFullFileName As String
    sFullFileName = fso.BuildPath(GetWorkingDir(), sPartialFileName)

    If fso.FileExists(sFullFileName) Then
        Kill sFullFileName
    End If

    plFileNum = OpenFileForBinary(sFullFileName)
    OpenCleanFileForBinary = sFullFileName

    '* caller needs to close plFileNum!!
    
End Function

Private Function OpenFileForBinary(ByVal sFullFileName As String) As Long

    Dim lFileNum As Long
    lFileNum = FreeFile
    '* caller needs to close plFileNum!!
    
    Open sFullFileName For Binary As lFileNum

    OpenFileForBinary = lFileNum

End Function

Private Sub TestPersistStringTypedArray()

    Dim lSaveFileNum As Long, sFullFileName As String
    sFullFileName = OpenCleanFileForBinary("persistStringTypedArray.bin", lSaveFileNum)
    Debug.Print sFullFileName

    '**** SAVE ****
    Dim asSave(0 To 3) As String
    asSave(0) = "Hi"
    asSave(1) = "Earth"
    asSave(2) = "Goodbye"
    asSave(3) = "Hercules"
    Put lSaveFileNum, , asSave
    Close lSaveFileNum

    '**** LOAD AS ONE DIM****
    Dim asLoadOneDim(0 To 3) As String
    Dim lLoadFileNum As Long
    lLoadFileNum = OpenFileForBinary(sFullFileName)
    Get lLoadFileNum, , asLoadOneDim
    Close lLoadFileNum

    '*** COMPARE ****
    '* Everything as you would expect
    Debug.Assert asLoadOneDim(0) = asSave(0)
    Debug.Assert asLoadOneDim(1) = asSave(1)
    Debug.Assert asLoadOneDim(2) = asSave(2)
    Debug.Assert asLoadOneDim(3) = asSave(3) 'Hercules


    '**** LOAD AS TWO DIM****
    Dim asLoadTwoDim(0 To 1, 0 To 1) As String

    lLoadFileNum = OpenFileForBinary(sFullFileName)
    Get lLoadFileNum, , asLoadTwoDim
    Close lLoadFileNum

    '*** COMPARE ****
    '* Note how loading a one dimensional array into a two dimensions causes the leftmost column to be populated first
    '* i.e. we load columnwise, not rowwise.  Counter-intuitive?
    Debug.Assert asLoadTwoDim(0, 0) = asSave(0) 'Hi
    Debug.Assert asLoadTwoDim(0, 1) = asSave(2) 'Goodbye
    Debug.Assert asLoadTwoDim(1, 0) = asSave(1) 'Earth
    Debug.Assert asLoadTwoDim(1, 1) = asSave(3) 'Hercules

    Stop '* inspect your Locals window
End Sub

The code above will write a file that looks like the following in Notepad++. You can see that Notepad++ is writing non printable ASCII bytes (ASCII code below 32) as control characters; NUL is zero, STX is two, ENQ is five, BEL is seven and BS is eight. So the non-NUL bytes are showing the following string's length. The NUL characters are not string terminators but instead the second byte of the string length, except the peristence format is little endian which means the least significant byte comes first; this implies a maximum string length of 65535.

Because the variable being saved is specifically a string array (and not a variant array) then there is no byte to indicate the following variable type. So this serves a nice plain example to begin with.

(In the second chunk of code I go on to write some hex byte dump logic because wrestling with Notepad++'s control codes became frustrating.)

Another learning point for this first chunk of code is the experiment with loading into a two-dimensional array and discovering that persistence of arrays is column-wise and not row-wise. That is to say, that given a two dimensional array to populate VBA will populate the leftmost column first and then the second column and then onwards moving left to right across the columns. This maybe counterintutive but veterans of VBA will understand this is related to ReDim Preserve only allowing the final dimension to be altered.

Second chunk of code saves a single element variant array with each a different vartype

So the second chunk of code saves a single element variant array with each a different variable type (VarType).

Thankfully, it also comes with some hex byte printing logic so we can ditch Notepad++. The bytes are given first in their raw little endian form and then in big endian which is easier to interpret. A little barrier also shows the two byte variable type (VarType)

We do a single element at a time otherwise we'll get a blizzard of hex!

The entry point this time is the subroutine TestPersistSole().

Private Function PersistSole(ByRef vSingleElement As Variant, ByVal sPartialFileName As String, Optional sValueAsString As Variant) As String
    'Debug.Print 'vbNewLine

    Dim lSaveFileNum As Long, sFullFileName As String
    sFullFileName = OpenCleanFileForBinary(sPartialFileName, lSaveFileNum)
    'Debug.Print sFullFileName

    '**** SAVE ****
    '* use one element of a one dimensional arrays to avoid clutter
    Dim avSave(0 To 0) As Variant
    avSave(0) = vSingleElement
    
    Put lSaveFileNum, , avSave
    Close lSaveFileNum

    '**** FIND FILE LENGTH ****
    Dim fso As New Scripting.FileSystemObject
    
    Dim lLoadFileNum As Long
    lLoadFileNum = fso.GetFile(sFullFileName).Size
    
    
    
    '**** LOAD ****
    ReDim abLoad(0 To lLoadFileNum - 1) As Byte
    lLoadFileNum = OpenFileForBinary(sFullFileName)
    Get lLoadFileNum, , abLoad()
    Close lLoadFileNum
    
    
    Dim sRawBytes As String:
    sRawBytes = RawBytes(abLoad)
    
    Dim sBigEndianBytes As String:
    sBigEndianBytes = BigEndianBytes(abLoad)
    
    Dim sValue As String
    On Error Resume Next
    sValue = VBA.IIf(IsMissing(sValueAsString), CStr(vSingleElement), sValueAsString)
    
    PersistSole = PadLeft(" ", TypeName(vSingleElement) & "(VarType " & VarType(vSingleElement) & ")", 20) & _
                    " of value " & PadRight(" ", sValue, 20) & sRawBytes & sBigEndianBytes
End Function


Private Function RawBytes(ByRef abBytes() As Byte) As String
    Dim sRawBytes As String: sRawBytes = ""
    sRawBytes = HexPad(abBytes, 0) & " " & HexPad(abBytes, 1) & "|"
    
    Dim lLoop As Long
    For lLoop = 2 To UBound(abBytes())
        sRawBytes = sRawBytes & HexPad(abBytes, lLoop) & " "
    Next lLoop
    
    RawBytes = " has raw hex bytes " & Trim(sRawBytes)
End Function

Private Function BigEndianBytes(ByRef abBytes() As Byte) As String
    
    Dim eVarType As VbVarType
    eVarType = abBytes(0)
    
    If eVarType <> vbString And eVarType <> vbByte And eVarType <> vbBoolean Then
    
        Dim sBigEndianBytes As String: sBigEndianBytes = ""
        sBigEndianBytes = HexPad(abBytes, 1) & " " & HexPad(abBytes, 0) & "|"
        
        Dim lLoop As Long
        For lLoop = UBound(abBytes()) To 2 Step -1
            sBigEndianBytes = sBigEndianBytes & HexPad(abBytes, lLoop) & " "
        Next lLoop
        
        BigEndianBytes = " or big endian " & Trim(sBigEndianBytes)
    End If
End Function


Private Sub TestPersistSole()

    '* string
    Debug.Print PersistSole("HELLO", "persistVariantArrayOfSingleString.bin")

    '* integer numbers
    Debug.Print PersistSole(CByte(8), "persistVariantArrayOfSingleByte.bin")

    Debug.Print PersistSole(CInt(511), "persistVariantArrayOfSingleInteger.bin")
    Debug.Print PersistSole(CLng(65535), "persistVariantArrayOfSingleLong.bin")

    '* skip Decimals
    '* https://docs.microsoft.com/en-us/dotnet/visual-basic/language-reference/data-types/decimal-data-type

    '* Booleans
    Debug.Print PersistSole(CBool(True), "persistVariantArrayOfBooleanTrue.bin")
    Debug.Print PersistSole(CBool(False), "persistVariantArrayOfBooleanFalse.bin")
    
    '* skip Singles
    
    '* doubles, 1 sign bit, 11 bit exponent, 52 fraction
    '* https://en.wikipedia.org/wiki/Double-precision_floating-point_format
    Debug.Print PersistSole(CDbl(-1), "persistVariantArrayOfDoubleFloat.bin")
    Debug.Print PersistSole(CDbl(0), "persistVariantArrayOfDoubleFloat.bin")
    Debug.Print PersistSole(CDbl(7.5), "persistVariantArrayOfDoubleFloat.bin")
    Debug.Print PersistSole(CDbl(Application.WorksheetFunction.Pi()), "persistVariantArrayOfDoubleFloat.bin")


    Debug.Print PersistSole(CDbl(1), "persistVariantArrayOfDoubleFloat.bin")
    Debug.Print PersistSole(CDbl(32767), "persistVariantArrayOfDoubleFloat.bin")
    
    '* Dates
    '* dates are implemented as doubles
    Debug.Print PersistSole(CDate(32767), "persistVariantArrayOfDate.bin")

    '* Specials
    Debug.Print PersistSole(Empty, "persistVariantArrayOfSingleEmpty.bin")
    Debug.Print PersistSole(Null, "persistVariantArrayOfSingleNull.bin")
    Debug.Print PersistSole(CVErr(2023), "persistVariantArrayOfSingleError.bin")



End Sub

Private Function HexPad(ByRef abBytes() As Byte, ByVal lIdx As Long) As String
    HexPad = PadRight("0", Hex$(abBytes(lIdx)), 2)
End Function

Private Function PadRight(ByVal sPad As String, ByVal sCore As String, ByVal lNum As Long) As String
    PadRight = Right$(String(lNum, sPad) & sCore, lNum)
End Function

Private Function PadLeft(ByVal sPad As String, ByVal sCore As String, ByVal lNum As Long) As String
    PadLeft = Left$(sCore & String(lNum, sPad), lNum)
End Function

and this prints the following output

String(VarType 8)    of value                HELLO has raw hex bytes 08 00|05 00 48 45 4C 4C 4F
Byte(VarType 17)     of value                    8 has raw hex bytes 11 00|08
Integer(VarType 2)   of value                  511 has raw hex bytes 02 00|FF 01 or big endian 00 02|01 FF
Long(VarType 3)      of value                65535 has raw hex bytes 03 00|FF FF 00 00 or big endian 00 03|00 00 FF FF
Boolean(VarType 11)  of value                 True has raw hex bytes 0B 00|FF FF
Boolean(VarType 11)  of value                False has raw hex bytes 0B 00|00 00
Double(VarType 5)    of value                   -1 has raw hex bytes 05 00|00 00 00 00 00 00 F0 BF or big endian 00 05|BF F0 00 00 00 00 00 00
Double(VarType 5)    of value                    0 has raw hex bytes 05 00|00 00 00 00 00 00 00 00 or big endian 00 05|00 00 00 00 00 00 00 00
Double(VarType 5)    of value                  7.5 has raw hex bytes 05 00|00 00 00 00 00 00 1E 40 or big endian 00 05|40 1E 00 00 00 00 00 00
Double(VarType 5)    of value     3.14159265358979 has raw hex bytes 05 00|18 2D 44 54 FB 21 09 40 or big endian 00 05|40 09 21 FB 54 44 2D 18
Double(VarType 5)    of value                    1 has raw hex bytes 05 00|00 00 00 00 00 00 F0 3F or big endian 00 05|3F F0 00 00 00 00 00 00
Double(VarType 5)    of value                32767 has raw hex bytes 05 00|00 00 00 00 C0 FF DF 40 or big endian 00 05|40 DF FF C0 00 00 00 00
Date(VarType 7)      of value           16/09/1989 has raw hex bytes 07 00|00 00 00 00 C0 FF DF 40 or big endian 00 07|40 DF FF C0 00 00 00 00
Empty(VarType 0)     of value                      has raw hex bytes 00 00| or big endian 00 00|
Null(VarType 1)      of value                      has raw hex bytes 01 00| or big endian 00 01|
Error(VarType 10)    of value           Error 2023 has raw hex bytes 0A 00|E7 07 0A 80 or big endian 00 0A|80 0A 07 E7

So interpreting the results we can see that for a variant array which is polymorphic/heterogeneous the first two bytes are the VarType of the element. We've already met the string representation in the first chunk of code, vartype then string length then one byte for each character. Persisting a single byte is trivial. The Integer, Long and Boolean types are intuitive. The Double (also Date) type is a IEEE 754 floating-point representation and is thus complex to interpret but the specification says that there is 1 sign bit, 11 exponent bits and 52 fraction bits.

I have even added some special use cases for other VarTypes (variant types): Empty and Null which are trivial; and Error which looks like a 32-bit Long (4 bytes).

Final Thoughts

As mentioned above I do not intend to write VBA code to manually load or save to this persistence format because VBA already does that for us. However, I feel I know enough now to attempt some JavaScript code which will persist a two-dimensional grid and return it from a web service to a VBA caller which VBA can then paste directly to a sheet. (Luckily, there is already a JavaScript library for the Doubles.) Watch out for that blog post next.

No comments:

Post a Comment