In this post I give a multi-threaded Python script which replicates one folder to another. It uses the Windows API ReadDirectoryChangesW function to watch for directory changes. It is start-able and stoppable from Excel VBA using the Windows Synchronization Events which I explored in the previous couple of posts.
Background
So I wanted some code to replicate a folder. Of course I could have written some VBA code to scan both a source and a destination folder every few seconds and figure what needed copying but this is a bit lame because it has needless disk activity and other techniques must be available. It is possible to use the Windows API, and specifically the ReadDirectoryChangesW function to watch a folder. .NET also has a folder watcher feature but I am happy to stick to Windows API.
ReadDirectoryChangesW blocks (in synchronous case)
But the problem with ReadDirectoryChangesW is that is blocks in the synchronous case. It is not like other waitable API function such as WaitForSingleObject(handle,interval) in that one can wait for a fixed time interval before breaking off to do some other work and give the impression of multi-threading.
I have seen on some VB6 forums some programmers attempt to use CreateThread() to give themselves another thread but they report crashes and this is because threads in VB6 and VBA need to obey COM apartment threading rules. I have yet to see successfully robust multi-threaded code for VB6 or VBA. Sure, I've seen code that looks like it runs multiple threads but the moment one puts in a break point and starts debugging Excel crashes. Perhaps one day I might solve that problem. In the meantime, I strongly recommend *not* creating threads in VBA. If you want concurrency then you should shell another process.
However, shelling another process raises the issue on how to communicate between the processes. In the previous two posts I demonstrated the use of Windows Synchronization Events as an inter process communication (IPC) mechanism and I will use this to signal the folder replicator to stop. To shell a child process on the VBA side, I am using Windows Script Host Object Model (wshom.ocx) and at some point the code could call Terminate() to kill the shelled process but this does not give the shelled process a chance to tidy up its resources. Instead, I am going to give VBA code which signals an Windows Synchronization Event object to politely ask the shelled process to tidy up and terminate itself.
ReadDirectoryChangesW is waitable in the asynchronous case
So upon further researching it turns out ReadDirectoryChangesW has a asynchronous use case as well which is helpful. Even more helpful is the fact I discovered Python code to call ReadDirectoryChangesW asynchronously in the test scripts of Mark Hammond's Win32 library, test_win32file.py. I am truly grateful for Mark Hammond's Windows Python library. For an explanation as to how the asynchronous use case works see the remarks section of ReadDirectoryChangesW docs page. So in the Python code it shows we can call WaitForSingleObject(handle,interval) in the asynchronous case.
But we still need two threads, one will watch the directory for changes and the other will wait for the stop signal to be issued by the Excel VBA code. So the following Python script is an example of how to launch a second thread in Python.
StoppableFolderReplicator.py
The core of the Python script below is based upon a unit test file for Mark Hammond's Windows Python library, test_win32file.py. The main() function runs first and validates the command line arguments, if valid it creates the class StoppableFolderReplicator where the real action takes place.
In the constructor of StoppableFolderReplicator class a handle to the directory to watch is obtained and then a second Thread is created to run the class's _watcherThreadOverlapped() method. Then the main thread runs the class's waitForEvent() method where it sits waiting for the Windows Synchronization Event (the stop event) to be signalled (by VBA).
The second thread runs the class's _watcherThreadOverlapped() method and loops around watching the directory by calling the ReadDirectoryChangesW() function. ReadDirectoryChangesW() returns either on a timeout or if there is a change in the directory it is watching; if the latter then any new file is copied across. Each iteration of the loop this thread also checks to the class's tearDownInProgress boolean property which is signalled by the main thread (in waitForEvent()) when the stop event is signalled; if signalled this second thread releases its handle and then quits.
A key line of code in the class's waitForEvent() method is self.watcher_thread.join(5) where the first thread pauses and waits for the second thread to acknowledge the stop, tidy up and then quit. Without this line, the main thread would quit earlier than the second thread with unexpected results.
Most messages printed are prefixed by a thread number to help folks understand how it works.
import win32api
import win32file
import pywintypes
import winerror
import win32event
import win32con
import ntsecuritycon
import sys
import os
import tempfile
import threading
import time
import shutil
from shutil import copyfile
class StoppableFolderReplicator():
def __init__(self, srcFolder, destFolder, hEvent):
self.srcFolder = srcFolder
self.destFolder = destFolder
self.hEvent = hEvent
self.tearDownInProgress = False
self.ACTIONS = {
1 : "Created",
2 : "Deleted",
3 : "Updated",
4 : "Renamed from something",
5 : "Renamed to something"
}
self.dir_handle = win32file.CreateFile(srcFolder,
ntsecuritycon.FILE_LIST_DIRECTORY,
win32con.FILE_SHARE_READ,
None, # security desc
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_BACKUP_SEMANTICS | win32con.FILE_FLAG_OVERLAPPED,
None)
self.watcher_thread = threading.Thread(target=self._watcherThreadOverlapped,
args=(srcFolder, destFolder, self.dir_handle))
self.watcher_thread.start()
def _watcherThreadOverlapped(self, srcFolder, destFolder, dir_handle):
print("thread #2:waiting for directory changes")
changes = []
flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
buf = win32file.AllocateReadBuffer(8192)
overlapped = pywintypes.OVERLAPPED()
overlapped.hEvent = win32event.CreateEvent(None, 0, 0, None)
while 1:
if self.tearDownInProgress:
print("thread #2:tidying up")
if self.dir_handle != 0:
win32api.CloseHandle(self.dir_handle)
self.dir_handle = 0
break
try:
win32file.ReadDirectoryChangesW(self.dir_handle,
buf,
False, #sub-tree
flags,
overlapped)
except Exception as e:
print("thread #2: Exception whilst ReadDirectoryChangesW:" + str(e) + "\n")
break
# Wait for our event or for a short time.
rc = win32event.WaitForSingleObject(overlapped.hEvent, 1000)
if rc == win32event.WAIT_OBJECT_0:
# got some data! Must use GetOverlappedResult to find out
# how much is valid! 0 generally means the handle has
# been closed. Blocking is OK here, as the event has
# already been set.
nbytes = win32file.GetOverlappedResult(dir_handle, overlapped, True)
if nbytes:
bits = win32file.FILE_NOTIFY_INFORMATION(buf, nbytes)
changes.extend(bits)
for action, file in changes:
full_filename = os.path.join(srcFolder, file)
print("thread #2:" , full_filename, self.ACTIONS.get(action, "Unknown"))
if action == 1 or action == 3:
#
# perhaps put some filtering or file renaming logic
# in here
#
if os.path.isfile(full_filename):
sDestFilename = os.path.join(destFolder, file)
try:
copyfile(full_filename,sDestFilename)
except Exception as e:
print("thread #2: Exception whilst copying file '" + full_filename + "':" + str(e) + "\n")
changes = [] # bugfix, need to clear this other it accumulates
else:
# This is "normal" exit - our 'tearDown' closes the
# handle.
# print "looks like dir handle was closed!"
break
print ("thread #2:stopped")
def waitForEvent(self):
try:
print("thread #1:waiting for stop event")
while 1:
dwWaitResult = win32event.WaitForSingleObject(self.hEvent.handle, 500)
if dwWaitResult == win32event.WAIT_OBJECT_0:
print("thread #1:stop signal received...")
self.tearDownInProgress = True
self.watcher_thread.join(5)
return
pass
except Exception as e:
print("thread #1:Exception whilst polling for event:'" + self.stopEventName + "':" + str(e) + "\n")
if __name__ == '__main__':
if len(sys.argv) < 4:
print("Usage:" + os.path.basename(__file__) + " <sourceFolder> <destinationFolder> <stopEventName>")
else:
srcFolder = sys.argv[1]
destFolder = sys.argv[2]
stopEventName = sys.argv[3]
ok = True
if not os.path.isdir(srcFolder):
print("Error, srcFolder '" + srcFolder + "' does not exist! Abandoning.")
ok = False
if not os.path.isdir(destFolder):
print("Error, destFolder '" + destFolder + "' does not exist! Abandoning.")
ok = False
if ok == True:
try:
stoppable = False
hEvent = win32event.OpenEvent(ntsecuritycon.SYNCHRONIZE, 0, stopEventName)
stoppable = True
except Exception as e:
print("Exception whilst opening event:'" + stopEventName + "':" + str(e) + "\n")
if stoppable:
foo = StoppableFolderReplicator(srcFolder, destFolder, hEvent)
foo.waitForEvent()
win32api.CloseHandle(hEvent)
print("all stopped")
else:
print("Not stoppable, abandoning!")
Calling Excel VBA code
So two lines of the VBA code depend upon Chip Pearson's FormatMessage() function is a freely downloadable module from his website; alternatively you could comment them out.
In the previous two posts, I wrote about Windows Synchronization Event objects and how to use them in VBA so I won't duplicate explanation of that.
You should run the TestStoppableFolderReplicator() procedure. The other procedure is there for debugging against a Visual Studio launched Python process and is useful for development.
The stop event is created before the Python script is shelled because the Python script is expecting to open the event object; we create the event object in an initial state of being unset.
Next, the code uses the Windows Script Host Object Model (wshom.ocx) type library to shell the Python script, we supply three extra arguments after the script name (i) source folder, (ii) destination folder and (iii) the stop event name.
The code then reaches a Stop statement at which point you are invited to pop off to the source folder and create some files to test the replication. For the quickest way to create a file, I right click in a folder's explorer window and in the context menu I select the menu option New->Text Document. When a file is created it is replicated to the destination folder specified.
Once you've got bored of creating files and seeing them replicated then press F5 to move off the Stop statement and send the stop signal, the code then loops waiting for the Python process to quit voluntarily, this should be for only a split second.
When running the code below the shelled process appears as a blank window but once finished the output to that window is displayed in the VBA IDE's Immediate window. Underneath the code listing is a sample of what appeared for me in the Immediate window.
Option Explicit
'* Tools->References
'* Windows Script Host Object Model (wshom.ocx)
Private Declare Function CreateEventWithoutSec Lib "kernel32" Alias "CreateEventA" (lpEventAttributes As Long, _
ByVal bManualReset As Long, ByVal bInitialState As Long, ByVal lpName As String) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function SetEvent Lib "kernel32.dll" (ByVal hEvent As Long) As Long
Private Declare Function ResetEvent Lib "kernel32.dll" (ByVal hEvent As Long) As Long
Private hEvent As Long
Sub TestStoppableFolderReplicator()
Dim oWshShell As IWshRuntimeLibrary.WshShell
Set oWshShell = New IWshRuntimeLibrary.WshShell
Const csPythonScriptFilePath As String = "C:\Users\Simon\source\repos\StoppableFolderReplicator\StoppableFolderReplicator\StoppableFolderReplicator.py "
Const csSourceFolder As String = """N:\Folder Replication Sandbox\Source"""
Const csDestinationFolder As String = """N:\Folder Replication Sandbox\Destination"""
Const csStopEvent As String = "stopReplicator"
Dim sCmdLine As String
sCmdLine = "python " & csPythonScriptFilePath & " " & csSourceFolder & " " & csDestinationFolder & " " & csStopEvent
hEvent = CreateEventWithoutSec(0, 1, 0, csStopEvent)
If hEvent = 0 Then
Debug.Print GetSystemErrorMessageText(Err.LastDllError)
GoTo SingleExit
End If
'*
'* launch the python script
'*
Dim oWshExec As IWshRuntimeLibrary.WshExec
Set oWshExec = oWshShell.Exec(sCmdLine)
Debug.Assert oWshExec.Status = WshRunning
'* stop here for a while, pop off to source folder and create some files
'* and watch them replicate to destination folder
'* once you've got bored press F5 to send the stop signal
Stop
'* Ill-advised to use oWshExec.Terminate() because of resources not being tidied up cleanly
'* so instead we set a signal which the shelled process looks for a shutdowns cleanly and voluntarily
Call SetEvent(hEvent)
'* allow the Python script a split second to respond the stop signal
'* in the meantime we'll loop until the Python process quits
While oWshExec.Status = WshRunning
DoEvents '* stay responsive
Wend
'* very important to reset event if you want subsequent runs to work as well!
Call ResetEvent(hEvent)
Debug.Assert oWshExec.Status = WshFinished
'* we can access what would have been printed to the console
Debug.Print oWshExec.StdOut.ReadAll
'* tidy up our handle, very important!
SingleExit:
If hEvent <> 0 Then
Call CloseHandle(hEvent)
hEvent = 0
End If
End Sub
Private Sub DebugStopSignal()
'* I used this procedure for debugging the stop signal
hEvent = CreateEventWithoutSec(0, 1, 0, "stopReplicator")
If hEvent = 0 Then
Debug.Print GetSystemErrorMessageText(Err.LastDllError)
GoTo SingleExit
Else
Stop
Call SetEvent(hEvent)
Stop
Call ResetEvent(hEvent)
End If
SingleExit:
If hEvent <> 0 Then
Call CloseHandle(hEvent)
hEvent = 0
End If
End Sub
Here is some sample output copied from my Immediate window
thread #2:waiting for directory changes
thread #1:waiting for stop event
thread #2: N:\Folder Replication Sandbox\Source\New Text Document.txt Created
thread #2: N:\Folder Replication Sandbox\Source\New Text Document.txt Created
thread #2: N:\Folder Replication Sandbox\Source\New Rich Text Document.rtf Created
thread #1:stop signal received...
thread #2:tidying up
thread #2:stopped
all stopped
Final thoughts
The Python script replicated a folder but it could do anything that you wanted to offload from VBA because of VBA's lack of multi-threading. Signalling a stop is a vital IPC mechanism. This is much better than just calling the Terminate() method. I really like Python as a means of extending VBA's power. Enjoy!
No comments:
Post a Comment