Friday, 29 June 2018

Python - Excel - Scripting Excel Objects and Files

So it's Python month on this Excel Development Platform blog where I showcase interesting Python technologies that might be of interest to Excel (VBA) Developers. Python does COM interop so it can script against the Excel Object Model. So what are the options?

Python to replace VBA?

Your first option could be to wait because VBA is old. VBA is as dated as Visual Basic 6.0 (they are essentially the same language). At some point VBA will be replaced. We do not know if VBA will be replaced with C# or VB.NET. YCombinator reports Excel team considering Python as scripting language, asking for feedback. Here is the feedback forum.

If Microsoft does replace VBA with Python I should imagine Microsoft would do a pretty good job of it.

Python COM Interop and Excel, win32com

So VBA scripts against the Excel object model. In fact the Excel object model is exported and exposed to any COM enabled language. Python has COM interop packages and is thus COM enabled. The key package is win32com. Below is some sample code which should be easy to follow for VBA programmers familiar with the Excel object model. The code is taken from Tim Golden's excellent website. Tim Golden has all sorts of excellent Python for windows tips here. I'd like to particularly highlight folder watching which I could use as part of a message queue solution. They leverage the pywin32 library written by Mark Hammond (twitter github ), another superb technologist!

import os, sys  #Written by Tim Golden
import csv
import tempfile
import win32com.client

#
# Cheating slightly because I know a .csv
# file will register as an Excel object
#

filename = tempfile.mktemp (".csv")
f = open (filename, "wb")
writer = csv.writer (f)
writer.writerows (range (i, i+5) for i in range (10))
f.close ()

#
# NB filename must (in effect) be absolute
#
os.startfile (filename)

wb1 = win32com.client.GetObject (filename)
wb1_c11 = wb1.ActiveSheet.Cells (1, 1)
print wb1_c11.Value
wb1_c11.Value += 1

xl = win32com.client.GetObject (Class="Excel.Application")
# could also use:
# xl = win32com.client.GetActiveObject ("Excel.Application")
xl_c11 = xl.ActiveSheet.Cells (1, 1)
print xl_c11.Value
xl_c11.Value += 1

print wb1_c11.Value
print xl_c11.Value

wb1.Close ()

os.remove (filename)

Python Packages Without Excel.exe

The following packages do not need a running copy of Excel. The authors have very kindly wrestled with the Excel file format specifications and put together code which can either create, read or manipulate Excel files without needing to run a copy of Excel. Economical for Excel licences and also ensures code can run on Linux (where Excel.exe cannot run) for further cost savings.

openpyxl

The documentation states

Openpyxl is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. It was born from lack of existing library to read/write natively from Python the Office Open XML format.

So this refers the face that Office 2010 introduced new file type options, xlsx and xlsm which are Xml based (and then zipped). To be fair the Xml formats are hard to follow and they have done us a great favour in writing this library. Here is some sample code to get a flavour.

from openpyxl import Workbook
wb = Workbook()

# grab the active worksheet
ws = wb.active

# Data can be assigned directly to cells
ws['A1'] = 42

# Rows can also be appended
ws.append([1, 2, 3])

# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()

# Save the file
wb.save("sample.xlsx")

xlsxwriter

Another package written for xlsx (Xml) file types this time written by Jon McNamara (github, twitter). John McNamara is also on cpan.org where has prolifically written many packages in different languages including Lua, Python, Ruby, C and Perl. Here is some sample code.

##############################################################################
#
# A simple example of some of the features of the XlsxWriter Python module.
#
# Copyright 2013-2018, John McNamara, jmcnamara@cpan.org
#
import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()

xlrd

The documentation says

xlrd is a library for reading data and formatting information from Excel files, whether they are .xls or .xlsx files.

So this package can read older formats as well as the newer .xlsx .xlsm (Xml) formats.

xlutils

The documentation describes this as

This package provides a collection of utilities for working with Excel files.

Without delving through all of the documentation I think this library is more file stream based, a bit like SAX for Xml. Here is some sample code defining a filter that allows even rows through but odd rows not ...

>>> from xlutils.filter import BaseFilter
>>> class EvenFilter(BaseFilter):
...
...     def row(self,rdrowx,wtrowx):
...         if not rdrowx%2:
...             self.next.row(rdrowx,wtrowx)
...
...     def cell(self,rdrowx,rdcolx,wtrowx,wtcolx):
...         if not rdrowx%2:
...             self.next.cell(rdrowx,rdcolx,wtrowx,wtcolx)

And here is run through using the filter give above...

>>> from xlutils.filter import process
>>> process(
...     MyReader('test.xls'),
...     MyFilter('before'),
...     EvenFilter(),
...     MyFilter('after'),
...     MyWriter()
...     )
before start
after start
before workbook  test.xls
after workbook  test.xls
before sheet  Sheet1
after sheet  Sheet1
before row 0 0
after row 0 0
before cell 0 0 0 0
after cell 0 0 0 0
before cell 0 1 0 1
after cell 0 1 0 1
before row 1 1
before cell 1 0 1 0
before cell 1 1 1 1
before sheet  Sheet2
after sheet  Sheet2
before row 0 0
after row 0 0
before cell 0 0 0 0
after cell 0 0 0 0
before cell 0 1 0 1
after cell 0 1 0 1
before row 1 1
before cell 1 0 1 0
before cell 1 1 1 1
before finish
after finish

No comments:

Post a Comment