Friday 6 December 2019

VBA, Python, Multi-threading - Stoppable Folder Replicator

In this post I give a multi-threaded Python script which replicates one folder to another. It uses the Windows API ReadDirectoryChangesW function to watch for directory changes. It is start-able and stoppable from Excel VBA using the Windows Synchronization Events which I explored in the previous couple of posts.

Background

So I wanted some code to replicate a folder. Of course I could have written some VBA code to scan both a source and a destination folder every few seconds and figure what needed copying but this is a bit lame because it has needless disk activity and other techniques must be available. It is possible to use the Windows API, and specifically the ReadDirectoryChangesW function to watch a folder. .NET also has a folder watcher feature but I am happy to stick to Windows API.

ReadDirectoryChangesW blocks (in synchronous case)

But the problem with ReadDirectoryChangesW is that is blocks in the synchronous case. It is not like other waitable API function such as WaitForSingleObject(handle,interval) in that one can wait for a fixed time interval before breaking off to do some other work and give the impression of multi-threading.

I have seen on some VB6 forums some programmers attempt to use CreateThread() to give themselves another thread but they report crashes and this is because threads in VB6 and VBA need to obey COM apartment threading rules. I have yet to see successfully robust multi-threaded code for VB6 or VBA. Sure, I've seen code that looks like it runs multiple threads but the moment one puts in a break point and starts debugging Excel crashes. Perhaps one day I might solve that problem. In the meantime, I strongly recommend *not* creating threads in VBA. If you want concurrency then you should shell another process.

However, shelling another process raises the issue on how to communicate between the processes. In the previous two posts I demonstrated the use of Windows Synchronization Events as an inter process communication (IPC) mechanism and I will use this to signal the folder replicator to stop. To shell a child process on the VBA side, I am using Windows Script Host Object Model (wshom.ocx) and at some point the code could call Terminate() to kill the shelled process but this does not give the shelled process a chance to tidy up its resources. Instead, I am going to give VBA code which signals an Windows Synchronization Event object to politely ask the shelled process to tidy up and terminate itself.

ReadDirectoryChangesW is waitable in the asynchronous case

So upon further researching it turns out ReadDirectoryChangesW has a asynchronous use case as well which is helpful. Even more helpful is the fact I discovered Python code to call ReadDirectoryChangesW asynchronously in the test scripts of Mark Hammond's Win32 library, test_win32file.py. I am truly grateful for Mark Hammond's Windows Python library. For an explanation as to how the asynchronous use case works see the remarks section of ReadDirectoryChangesW docs page. So in the Python code it shows we can call WaitForSingleObject(handle,interval) in the asynchronous case.

But we still need two threads, one will watch the directory for changes and the other will wait for the stop signal to be issued by the Excel VBA code. So the following Python script is an example of how to launch a second thread in Python.

StoppableFolderReplicator.py

The core of the Python script below is based upon a unit test file for Mark Hammond's Windows Python library, test_win32file.py. The main() function runs first and validates the command line arguments, if valid it creates the class StoppableFolderReplicator where the real action takes place.

In the constructor of StoppableFolderReplicator class a handle to the directory to watch is obtained and then a second Thread is created to run the class's _watcherThreadOverlapped() method. Then the main thread runs the class's waitForEvent() method where it sits waiting for the Windows Synchronization Event (the stop event) to be signalled (by VBA).

The second thread runs the class's _watcherThreadOverlapped() method and loops around watching the directory by calling the ReadDirectoryChangesW() function. ReadDirectoryChangesW() returns either on a timeout or if there is a change in the directory it is watching; if the latter then any new file is copied across. Each iteration of the loop this thread also checks to the class's tearDownInProgress boolean property which is signalled by the main thread (in waitForEvent()) when the stop event is signalled; if signalled this second thread releases its handle and then quits.

A key line of code in the class's waitForEvent() method is self.watcher_thread.join(5) where the first thread pauses and waits for the second thread to acknowledge the stop, tidy up and then quit. Without this line, the main thread would quit earlier than the second thread with unexpected results.

Most messages printed are prefixed by a thread number to help folks understand how it works.

import win32api
import win32file
import pywintypes
import winerror
import win32event
import win32con
import ntsecuritycon
import sys
import os
import tempfile
import threading
import time
import shutil
from shutil import copyfile

class StoppableFolderReplicator(): 

    def __init__(self, srcFolder, destFolder, hEvent):
        self.srcFolder = srcFolder
        self.destFolder = destFolder
        self.hEvent = hEvent
        self.tearDownInProgress = False

        self.ACTIONS = {
                1 : "Created",
                2 : "Deleted",
                3 : "Updated",
                4 : "Renamed from something",
                5 : "Renamed to something"
            }
        
        self.dir_handle = win32file.CreateFile(srcFolder, 
                                        ntsecuritycon.FILE_LIST_DIRECTORY,
                                        win32con.FILE_SHARE_READ,
                                        None, # security desc
                                        win32con.OPEN_EXISTING,
                                        win32con.FILE_FLAG_BACKUP_SEMANTICS | win32con.FILE_FLAG_OVERLAPPED,
                                        None)

        self.watcher_thread = threading.Thread(target=self._watcherThreadOverlapped,
                            args=(srcFolder, destFolder, self.dir_handle))
        self.watcher_thread.start()


    def _watcherThreadOverlapped(self, srcFolder, destFolder, dir_handle):
        print("thread #2:waiting for directory changes")
        changes = []
        flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
        buf = win32file.AllocateReadBuffer(8192)
        overlapped = pywintypes.OVERLAPPED()
        overlapped.hEvent = win32event.CreateEvent(None, 0, 0, None)
        while 1:
            if self.tearDownInProgress:
                print("thread #2:tidying up")
                if self.dir_handle != 0:
                    win32api.CloseHandle(self.dir_handle)
                    self.dir_handle = 0
                break
            try:
                win32file.ReadDirectoryChangesW(self.dir_handle,
                                                buf,
                                                False, #sub-tree
                                                flags,
                                                overlapped)
            except Exception as e:
                print("thread #2: Exception whilst ReadDirectoryChangesW:" + str(e) + "\n")
                break 
            # Wait for our event or for a short time.
            rc = win32event.WaitForSingleObject(overlapped.hEvent, 1000)
            if rc == win32event.WAIT_OBJECT_0:
                # got some data!  Must use GetOverlappedResult to find out
                # how much is valid!  0 generally means the handle has
                # been closed.  Blocking is OK here, as the event has
                # already been set.
                nbytes = win32file.GetOverlappedResult(dir_handle, overlapped, True)
                if nbytes:
                    bits = win32file.FILE_NOTIFY_INFORMATION(buf, nbytes)
                    changes.extend(bits)

                    for action, file in changes:
                        full_filename = os.path.join(srcFolder, file)
                        print("thread #2:" , full_filename, self.ACTIONS.get(action, "Unknown"))
                        if action == 1 or action == 3:
                            #
                            # perhaps put some filtering or file renaming logic
                            # in here
                            #
                            if os.path.isfile(full_filename):
                                sDestFilename = os.path.join(destFolder, file)
                                try:
                                    copyfile(full_filename,sDestFilename)
                                except Exception as e:
                                    print("thread #2: Exception whilst copying file '" + full_filename + "':" + str(e) + "\n")
                    changes = []  # bugfix, need to clear this other it accumulates
                else:
                    # This is "normal" exit - our 'tearDown' closes the
                    # handle.
                    # print "looks like dir handle was closed!"
                    break
        print ("thread #2:stopped")

    def waitForEvent(self):
        try:
            print("thread #1:waiting for stop event")
            while 1:
                
                dwWaitResult = win32event.WaitForSingleObject(self.hEvent.handle, 500)
                if dwWaitResult == win32event.WAIT_OBJECT_0:
                    print("thread #1:stop signal received...")
                    self.tearDownInProgress = True
                    self.watcher_thread.join(5)
                    return
                pass

        except Exception as e:
            print("thread #1:Exception whilst polling for event:'" + self.stopEventName + "':" + str(e) + "\n")

        
if __name__ == '__main__':

    if len(sys.argv) < 4:
        print("Usage:" + os.path.basename(__file__) + " <sourceFolder> <destinationFolder> <stopEventName>")
    else:
        srcFolder = sys.argv[1]
        destFolder = sys.argv[2]
        stopEventName = sys.argv[3]

        ok = True
        if not os.path.isdir(srcFolder):
            print("Error, srcFolder '" + srcFolder + "' does not exist!  Abandoning.")
            ok = False

        if not os.path.isdir(destFolder):
            print("Error, destFolder '" + destFolder + "' does not exist!  Abandoning.")
            ok = False

        if ok == True:
            try:
                stoppable = False
                hEvent = win32event.OpenEvent(ntsecuritycon.SYNCHRONIZE, 0, stopEventName)
                
                stoppable = True
            except Exception as e:
                print("Exception whilst opening event:'" + stopEventName + "':" + str(e) + "\n")

            if stoppable:
                foo = StoppableFolderReplicator(srcFolder, destFolder, hEvent) 
                foo.waitForEvent()
                win32api.CloseHandle(hEvent)
                print("all stopped")
            else:
                print("Not stoppable, abandoning!")

Calling Excel VBA code

So two lines of the VBA code depend upon Chip Pearson's FormatMessage() function is a freely downloadable module from his website; alternatively you could comment them out.

In the previous two posts, I wrote about Windows Synchronization Event objects and how to use them in VBA so I won't duplicate explanation of that.

You should run the TestStoppableFolderReplicator() procedure. The other procedure is there for debugging against a Visual Studio launched Python process and is useful for development.

The stop event is created before the Python script is shelled because the Python script is expecting to open the event object; we create the event object in an initial state of being unset.

Next, the code uses the Windows Script Host Object Model (wshom.ocx) type library to shell the Python script, we supply three extra arguments after the script name (i) source folder, (ii) destination folder and (iii) the stop event name.

The code then reaches a Stop statement at which point you are invited to pop off to the source folder and create some files to test the replication. For the quickest way to create a file, I right click in a folder's explorer window and in the context menu I select the menu option New->Text Document. When a file is created it is replicated to the destination folder specified.

Once you've got bored of creating files and seeing them replicated then press F5 to move off the Stop statement and send the stop signal, the code then loops waiting for the Python process to quit voluntarily, this should be for only a split second.

When running the code below the shelled process appears as a blank window but once finished the output to that window is displayed in the VBA IDE's Immediate window. Underneath the code listing is a sample of what appeared for me in the Immediate window.

Option Explicit

'* Tools->References
'*   Windows Script Host Object Model (wshom.ocx)

Private Declare Function CreateEventWithoutSec Lib "kernel32" Alias "CreateEventA" (lpEventAttributes As Long, _
                    ByVal bManualReset As Long, ByVal bInitialState As Long, ByVal lpName As String) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function SetEvent Lib "kernel32.dll" (ByVal hEvent As Long) As Long
Private Declare Function ResetEvent Lib "kernel32.dll" (ByVal hEvent As Long) As Long

Private hEvent As Long



Sub TestStoppableFolderReplicator()

    Dim oWshShell As IWshRuntimeLibrary.WshShell
    Set oWshShell = New IWshRuntimeLibrary.WshShell

    Const csPythonScriptFilePath As String = "C:\Users\Simon\source\repos\StoppableFolderReplicator\StoppableFolderReplicator\StoppableFolderReplicator.py "
    Const csSourceFolder As String = """N:\Folder Replication Sandbox\Source"""
    Const csDestinationFolder As String = """N:\Folder Replication Sandbox\Destination"""
    Const csStopEvent As String = "stopReplicator"
    
    Dim sCmdLine As String
    sCmdLine = "python " & csPythonScriptFilePath & " " & csSourceFolder & " " & csDestinationFolder & " " & csStopEvent
    
    hEvent = CreateEventWithoutSec(0, 1, 0, csStopEvent)
    If hEvent = 0 Then
        Debug.Print GetSystemErrorMessageText(Err.LastDllError)
        GoTo SingleExit
    End If
    
    '*
    '* launch the python script
    '*
    Dim oWshExec As IWshRuntimeLibrary.WshExec
    Set oWshExec = oWshShell.Exec(sCmdLine)
    Debug.Assert oWshExec.Status = WshRunning

    '* stop here for a while, pop off to source folder and create some files
    '* and watch them replicate to destination folder
    '* once you've got bored press F5 to send the stop signal
    Stop
    
    '* Ill-advised to use oWshExec.Terminate() because of resources not being tidied up cleanly
    '* so instead we set a signal which the shelled process looks for a shutdowns cleanly and voluntarily
    Call SetEvent(hEvent)

    '* allow the Python script a split second to respond the stop signal
    '* in the meantime we'll loop until the Python process quits
    While oWshExec.Status = WshRunning
        DoEvents '* stay responsive
    Wend

    '* very important to reset event if you want subsequent runs to work as well!
    Call ResetEvent(hEvent)
    Debug.Assert oWshExec.Status = WshFinished
    
    '* we can access what would have been printed to the console
    Debug.Print oWshExec.StdOut.ReadAll
    
    '* tidy up our handle, very important!
SingleExit:
    If hEvent <> 0 Then
        Call CloseHandle(hEvent)
        hEvent = 0
    End If

End Sub

Private Sub DebugStopSignal()
    '* I used this procedure for debugging the stop signal
    hEvent = CreateEventWithoutSec(0, 1, 0, "stopReplicator")
    If hEvent = 0 Then
        Debug.Print GetSystemErrorMessageText(Err.LastDllError)
        GoTo SingleExit
    Else
        Stop
        Call SetEvent(hEvent)
    
        Stop
        Call ResetEvent(hEvent)
    End If
    
SingleExit:
    If hEvent <> 0 Then
        Call CloseHandle(hEvent)
        hEvent = 0
    End If
End Sub

Here is some sample output copied from my Immediate window

thread #2:waiting for directory changes
thread #1:waiting for stop event
thread #2: N:\Folder Replication Sandbox\Source\New Text Document.txt Created
thread #2: N:\Folder Replication Sandbox\Source\New Text Document.txt Created
thread #2: N:\Folder Replication Sandbox\Source\New Rich Text Document.rtf Created
thread #1:stop signal received...
thread #2:tidying up
thread #2:stopped
all stopped

Final thoughts

The Python script replicated a folder but it could do anything that you wanted to offload from VBA because of VBA's lack of multi-threading. Signalling a stop is a vital IPC mechanism. This is much better than just calling the Terminate() method. I really like Python as a means of extending VBA's power. Enjoy!

Links