In this post I investigate the undocumented binary representation of variant arrays in files as saved by VBA using the FreeFile, Open For Binary and Put statements. We'll see how each element is preceded by a vbVarType constant followed by a binary representation. This is another attempt to find a good serialization format suited for VBA developers, one that scores higher than XML or JSON because it is native to the VBA/VB6 ecosystem. I believe the undocumented binary persistence format is a good candidate.
Approach
Some Experiments and Investigations
So the following programs are experimental; they write a variable to a binary file for inspection. The first program will write a string array to file and then the reader is invited to inspect the file in Notepad++ or a Hex Editor (I give a screenshot). In the second program I go on to write some code to dump the bytes in hex form to the Immediate Window because I got frustrated with interpreting non printable ASCII characters.
The Goal is VBA as Consumer
I have absolutely no intention of writing VBA code to manually save/load structures to/from a binary files because VBA already supports this. I demonstrate this serialization logic to support writing code in languages other than VBA such as Javascript to generate a file that can be consumed by VBA. The use case scenario could be a web service returning some data to a VBA caller, rather than parse JSON or XML it would be better if VBA loaded the binary file in a variable using the FreeFile, Open For Binary and Get statements. In this way I am shifting the burden of parsing logic to the server-side.
I hope to go on and actually give some JavaScript code that does implement this serialization format in the next post. Because all JavaScript floating point numbers are Double precision (IEEE 754) I'll skip Single as VBA variable type. Also, I will skip the Decimal VBA variable type. Dates are encoded as Doubles so that will be covered.
Investigate arrays first then concentrate on typing
So the first chunk of code will work with strings only to abstract any typology. The program will save a one-dimensional string array; the code then loads the array back into an identically dimensioned one-dimensional string array as a base case. The program then loads the array back into a two dimensional array to investigate if the persistence is row-wise or column-wise.
The second chunk of code then focuses on persisting a single element array, each time with a different variable type to investigate the binary representation of each variable type. I needed to work with one element only here because otherwise the stream of hex bytes is too much to interpret!
The Code
All the code should be placed in a single standard module. You will also need Tools->References->Microsoft Scripting Runtime for some other file handling. The binary files are generated in a sub-directory of the user's temp directory.
First chunk of code saves a string array
So here is our first chunk of code which experiments with array structures. See below for a screenshot of the file in Notepad++.
You should run TestPersistStringTypedArray as the starting point.
Option Explicit
Private Function GetWorkingDir() As String
Dim fso As New Scripting.FileSystemObject '* Tools->References->Microsoft Scripting Runtime
Const TemporaryFolder As Long = 2
Dim fldTemp As Scripting.Folder
Set fldTemp = fso.GetSpecialFolder(TemporaryFolder)
Dim sPathVBAPersist As String
sPathVBAPersist = fso.BuildPath(fldTemp, "VBAPersist")
If Not fso.FolderExists(sPathVBAPersist) Then
fso.CreateFolder sPathVBAPersist
End If
GetWorkingDir = sPathVBAPersist
End Function
Private Function OpenCleanFileForBinary(ByVal sPartialFileName As String, ByRef plFileNum As Long) As String
Dim fso As New Scripting.FileSystemObject '* Tools->References->Microsoft Scripting Runtime
Dim sFullFileName As String
sFullFileName = fso.BuildPath(GetWorkingDir(), sPartialFileName)
If fso.FileExists(sFullFileName) Then
Kill sFullFileName
End If
plFileNum = OpenFileForBinary(sFullFileName)
OpenCleanFileForBinary = sFullFileName
'* caller needs to close plFileNum!!
End Function
Private Function OpenFileForBinary(ByVal sFullFileName As String) As Long
Dim lFileNum As Long
lFileNum = FreeFile
'* caller needs to close plFileNum!!
Open sFullFileName For Binary As lFileNum
OpenFileForBinary = lFileNum
End Function
Private Sub TestPersistStringTypedArray()
Dim lSaveFileNum As Long, sFullFileName As String
sFullFileName = OpenCleanFileForBinary("persistStringTypedArray.bin", lSaveFileNum)
Debug.Print sFullFileName
'**** SAVE ****
Dim asSave(0 To 3) As String
asSave(0) = "Hi"
asSave(1) = "Earth"
asSave(2) = "Goodbye"
asSave(3) = "Hercules"
Put lSaveFileNum, , asSave
Close lSaveFileNum
'**** LOAD AS ONE DIM****
Dim asLoadOneDim(0 To 3) As String
Dim lLoadFileNum As Long
lLoadFileNum = OpenFileForBinary(sFullFileName)
Get lLoadFileNum, , asLoadOneDim
Close lLoadFileNum
'*** COMPARE ****
'* Everything as you would expect
Debug.Assert asLoadOneDim(0) = asSave(0)
Debug.Assert asLoadOneDim(1) = asSave(1)
Debug.Assert asLoadOneDim(2) = asSave(2)
Debug.Assert asLoadOneDim(3) = asSave(3) 'Hercules
'**** LOAD AS TWO DIM****
Dim asLoadTwoDim(0 To 1, 0 To 1) As String
lLoadFileNum = OpenFileForBinary(sFullFileName)
Get lLoadFileNum, , asLoadTwoDim
Close lLoadFileNum
'*** COMPARE ****
'* Note how loading a one dimensional array into a two dimensions causes the leftmost column to be populated first
'* i.e. we load columnwise, not rowwise. Counter-intuitive?
Debug.Assert asLoadTwoDim(0, 0) = asSave(0) 'Hi
Debug.Assert asLoadTwoDim(0, 1) = asSave(2) 'Goodbye
Debug.Assert asLoadTwoDim(1, 0) = asSave(1) 'Earth
Debug.Assert asLoadTwoDim(1, 1) = asSave(3) 'Hercules
Stop '* inspect your Locals window
End Sub
The code above will write a file that looks like the following in Notepad++. You can see that Notepad++ is writing non printable ASCII bytes (ASCII code below 32) as control characters; NUL is zero, STX is two, ENQ is five, BEL is seven and BS is eight. So the non-NUL bytes are showing the following string's length. The NUL characters are not string terminators but instead the second byte of the string length, except the peristence format is little endian which means the least significant byte comes first; this implies a maximum string length of 65535.
Because the variable being saved is specifically a string array (and not a variant array) then there is no byte to indicate the following variable type. So this serves a nice plain example to begin with.
(In the second chunk of code I go on to write some hex byte dump logic because wrestling with Notepad++'s control codes became frustrating.)
Another learning point for this first chunk of code is the experiment with loading into a two-dimensional array and discovering that persistence of arrays is column-wise and not row-wise. That is to say, that given a two dimensional array to populate VBA will populate the leftmost column first and then the second column and then onwards moving left to right across the columns. This maybe counterintutive but veterans of VBA will understand this is related to ReDim Preserve only allowing the final dimension to be altered.
Second chunk of code saves a single element variant array with each a different vartype
So the second chunk of code saves a single element variant array with each a different variable type (VarType).
Thankfully, it also comes with some hex byte printing logic so we can ditch Notepad++. The bytes are given first in their raw little endian form and then in big endian which is easier to interpret. A little barrier also shows the two byte variable type (VarType)
We do a single element at a time otherwise we'll get a blizzard of hex!
The entry point this time is the subroutine TestPersistSole().
Private Function PersistSole(ByRef vSingleElement As Variant, ByVal sPartialFileName As String, Optional sValueAsString As Variant) As String
'Debug.Print 'vbNewLine
Dim lSaveFileNum As Long, sFullFileName As String
sFullFileName = OpenCleanFileForBinary(sPartialFileName, lSaveFileNum)
'Debug.Print sFullFileName
'**** SAVE ****
'* use one element of a one dimensional arrays to avoid clutter
Dim avSave(0 To 0) As Variant
avSave(0) = vSingleElement
Put lSaveFileNum, , avSave
Close lSaveFileNum
'**** FIND FILE LENGTH ****
Dim fso As New Scripting.FileSystemObject
Dim lLoadFileNum As Long
lLoadFileNum = fso.GetFile(sFullFileName).Size
'**** LOAD ****
ReDim abLoad(0 To lLoadFileNum - 1) As Byte
lLoadFileNum = OpenFileForBinary(sFullFileName)
Get lLoadFileNum, , abLoad()
Close lLoadFileNum
Dim sRawBytes As String:
sRawBytes = RawBytes(abLoad)
Dim sBigEndianBytes As String:
sBigEndianBytes = BigEndianBytes(abLoad)
Dim sValue As String
On Error Resume Next
sValue = VBA.IIf(IsMissing(sValueAsString), CStr(vSingleElement), sValueAsString)
PersistSole = PadLeft(" ", TypeName(vSingleElement) & "(VarType " & VarType(vSingleElement) & ")", 20) & _
" of value " & PadRight(" ", sValue, 20) & sRawBytes & sBigEndianBytes
End Function
Private Function RawBytes(ByRef abBytes() As Byte) As String
Dim sRawBytes As String: sRawBytes = ""
sRawBytes = HexPad(abBytes, 0) & " " & HexPad(abBytes, 1) & "|"
Dim lLoop As Long
For lLoop = 2 To UBound(abBytes())
sRawBytes = sRawBytes & HexPad(abBytes, lLoop) & " "
Next lLoop
RawBytes = " has raw hex bytes " & Trim(sRawBytes)
End Function
Private Function BigEndianBytes(ByRef abBytes() As Byte) As String
Dim eVarType As VbVarType
eVarType = abBytes(0)
If eVarType <> vbString And eVarType <> vbByte And eVarType <> vbBoolean Then
Dim sBigEndianBytes As String: sBigEndianBytes = ""
sBigEndianBytes = HexPad(abBytes, 1) & " " & HexPad(abBytes, 0) & "|"
Dim lLoop As Long
For lLoop = UBound(abBytes()) To 2 Step -1
sBigEndianBytes = sBigEndianBytes & HexPad(abBytes, lLoop) & " "
Next lLoop
BigEndianBytes = " or big endian " & Trim(sBigEndianBytes)
End If
End Function
Private Sub TestPersistSole()
'* string
Debug.Print PersistSole("HELLO", "persistVariantArrayOfSingleString.bin")
'* integer numbers
Debug.Print PersistSole(CByte(8), "persistVariantArrayOfSingleByte.bin")
Debug.Print PersistSole(CInt(511), "persistVariantArrayOfSingleInteger.bin")
Debug.Print PersistSole(CLng(65535), "persistVariantArrayOfSingleLong.bin")
'* skip Decimals
'* https://docs.microsoft.com/en-us/dotnet/visual-basic/language-reference/data-types/decimal-data-type
'* Booleans
Debug.Print PersistSole(CBool(True), "persistVariantArrayOfBooleanTrue.bin")
Debug.Print PersistSole(CBool(False), "persistVariantArrayOfBooleanFalse.bin")
'* skip Singles
'* doubles, 1 sign bit, 11 bit exponent, 52 fraction
'* https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Debug.Print PersistSole(CDbl(-1), "persistVariantArrayOfDoubleFloat.bin")
Debug.Print PersistSole(CDbl(0), "persistVariantArrayOfDoubleFloat.bin")
Debug.Print PersistSole(CDbl(7.5), "persistVariantArrayOfDoubleFloat.bin")
Debug.Print PersistSole(CDbl(Application.WorksheetFunction.Pi()), "persistVariantArrayOfDoubleFloat.bin")
Debug.Print PersistSole(CDbl(1), "persistVariantArrayOfDoubleFloat.bin")
Debug.Print PersistSole(CDbl(32767), "persistVariantArrayOfDoubleFloat.bin")
'* Dates
'* dates are implemented as doubles
Debug.Print PersistSole(CDate(32767), "persistVariantArrayOfDate.bin")
'* Specials
Debug.Print PersistSole(Empty, "persistVariantArrayOfSingleEmpty.bin")
Debug.Print PersistSole(Null, "persistVariantArrayOfSingleNull.bin")
Debug.Print PersistSole(CVErr(2023), "persistVariantArrayOfSingleError.bin")
End Sub
Private Function HexPad(ByRef abBytes() As Byte, ByVal lIdx As Long) As String
HexPad = PadRight("0", Hex$(abBytes(lIdx)), 2)
End Function
Private Function PadRight(ByVal sPad As String, ByVal sCore As String, ByVal lNum As Long) As String
PadRight = Right$(String(lNum, sPad) & sCore, lNum)
End Function
Private Function PadLeft(ByVal sPad As String, ByVal sCore As String, ByVal lNum As Long) As String
PadLeft = Left$(sCore & String(lNum, sPad), lNum)
End Function
and this prints the following output
String(VarType 8) of value HELLO has raw hex bytes 08 00|05 00 48 45 4C 4C 4F
Byte(VarType 17) of value 8 has raw hex bytes 11 00|08
Integer(VarType 2) of value 511 has raw hex bytes 02 00|FF 01 or big endian 00 02|01 FF
Long(VarType 3) of value 65535 has raw hex bytes 03 00|FF FF 00 00 or big endian 00 03|00 00 FF FF
Boolean(VarType 11) of value True has raw hex bytes 0B 00|FF FF
Boolean(VarType 11) of value False has raw hex bytes 0B 00|00 00
Double(VarType 5) of value -1 has raw hex bytes 05 00|00 00 00 00 00 00 F0 BF or big endian 00 05|BF F0 00 00 00 00 00 00
Double(VarType 5) of value 0 has raw hex bytes 05 00|00 00 00 00 00 00 00 00 or big endian 00 05|00 00 00 00 00 00 00 00
Double(VarType 5) of value 7.5 has raw hex bytes 05 00|00 00 00 00 00 00 1E 40 or big endian 00 05|40 1E 00 00 00 00 00 00
Double(VarType 5) of value 3.14159265358979 has raw hex bytes 05 00|18 2D 44 54 FB 21 09 40 or big endian 00 05|40 09 21 FB 54 44 2D 18
Double(VarType 5) of value 1 has raw hex bytes 05 00|00 00 00 00 00 00 F0 3F or big endian 00 05|3F F0 00 00 00 00 00 00
Double(VarType 5) of value 32767 has raw hex bytes 05 00|00 00 00 00 C0 FF DF 40 or big endian 00 05|40 DF FF C0 00 00 00 00
Date(VarType 7) of value 16/09/1989 has raw hex bytes 07 00|00 00 00 00 C0 FF DF 40 or big endian 00 07|40 DF FF C0 00 00 00 00
Empty(VarType 0) of value has raw hex bytes 00 00| or big endian 00 00|
Null(VarType 1) of value has raw hex bytes 01 00| or big endian 00 01|
Error(VarType 10) of value Error 2023 has raw hex bytes 0A 00|E7 07 0A 80 or big endian 00 0A|80 0A 07 E7
So interpreting the results we can see that for a variant array which is polymorphic/heterogeneous the first two bytes are the VarType of the element. We've already met the string representation in the first chunk of code, vartype then string length then one byte for each character. Persisting a single byte is trivial. The Integer, Long and Boolean types are intuitive. The Double (also Date) type is a IEEE 754 floating-point representation and is thus complex to interpret but the specification says that there is 1 sign bit, 11 exponent bits and 52 fraction bits.
I have even added some special use cases for other VarTypes (variant types): Empty and Null which are trivial; and Error which looks like a 32-bit Long (4 bytes).
Final Thoughts
As mentioned above I do not intend to write VBA code to manually load or save to this persistence format because VBA already does that for us. However, I feel I know enough now to attempt some JavaScript code which will persist a two-dimensional grid and return it from a web service to a VBA caller which VBA can then paste directly to a sheet. (Luckily, there is already a JavaScript library for the Doubles.) Watch out for that blog post next.
No comments:
Post a Comment