Friday, 29 June 2018

Python - Pandas - VBA - Return a Pandas Pivot Table to VBA (using COM)

So it's Python month on this Excel Development Platform blog where I highlight some Python technologies of interest to Excel (VBA) Developers.

A few days prior I blogged a post about using Pandas to process some very large Eurostat Population Density data set, once I got the Pandas pivot table I saved it to a csv file which is unsatisfactory; the truth is I was having difficulty returning the Pandas pivot table to VBA as a 2d OLE Variant that can be pasted onto a worksheet. In this post I fix that.

Python code to return an OLE Variant from a Pandas Pivot

First the Python code. So the upper half of this code is the same as in the previous pandas article. So this task involves extracting portions of the pivot table, converting them to numpy arrays and then glueing them into one large 2d array.

import pandas as pd
import numpy as np

class PopulationDensity(object):
    #"{C50910CC-F88F-4EA5-86D4-1E5D6AF1F4AE}"
    _reg_clsid_ = "{C50910CC-F88F-4EA5-86D4-1E5D6AF1F4AE}"
    _reg_progid_= 'PandasInVBA.PopulationDensity'
    _public_methods_ = ['getPivotTable']

    def getPivotTable(self):
        ## csv file in zip http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip

        url="C:/Users/Simon/Downloads/GEOSTAT-grid-POP-1K-2011-V2-0-1/Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
        whole=pd.read_csv(url, low_memory=False)

        ## only want two columns
        populationDensity=whole[['TOT_P','CNTR_CODE']]

        ## trying to replicate graph here http://www.centreforcities.org/wp-content/uploads/2018/04/18-04-16-Square-kilometre-units-of-land-by-population.png
        ## which aggregates the records by brackets

        # https://stackoverflow.com/questions/25010215/pandas-groupby-how-to-compute-counts-in-ranges#answer-25010952
        # also relevant https://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values#answer-21441621
        ranges = [0,10000,15000,20000,25000,30000,35000,40000,45000,1000000]

        ## Yes groupby but we want to work with a DataFrame afterwards
        # https://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe
        groupedByBracketandCountry = populationDensity.groupby(['CNTR_CODE',pd.cut(populationDensity['TOT_P'],ranges).astype(str)]).size().to_frame(name = 'count').reset_index()

        # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html#pandas.DataFrame.pivot_table
        pivottable2=pd.pivot_table(groupedByBracketandCountry, fill_value=0, values='count', index='TOT_P',columns=['CNTR_CODE'] , aggfunc=np.sum)

        # new code here that extracts portions of pivot table into arrays and then glues then together into one returnable array

        pivottable2Values = pivottable2.values             # this gives the data elements of the pivot table
        pivottable2Columns = pivottable2.columns.values    # this gives the column headers of the pivot table

        # the next line glues one 2d array on top of another (i.e. vertically)
        pivottable2ColumnsAndValues = np.vstack((pivottable2Columns,pivottable2Values))

        # this will be our origin cell, in the top left
        pythontopLeft=np.zeros((1,1))

        # this gets our rows headers in a 2d array, np.reshape upgrades the 1d array to 2d
        indexValues =  np.reshape(pivottable2.index.values, (-1, 1))

        # this next line glues one 2d array on top of another (i.e. vertically)
        pivottable2Index = np.vstack((pythontopLeft,indexValues))

        # this next line glues one 2d array onto the side of another (i.e. horizontally)
        vRet = np.hstack((pivottable2Index,pivottable2ColumnsAndValues))

        # this final line helps the conversion to an OLE Variant pastable onto an Excel worksheet
        return vRet.tolist()

if __name__=='__main__':
    print ("Registering COM server...")
    import win32com.server.register
    win32com.server.register.UseCommandLine(PopulationDensity)

np.vstack, np.hstack and np.reshape

There are some interesting numpy array functions introduced here np.vstack, np.hstack and np.reshape; the first two glue two array either vertically or horizontally; np.reshape can be used to redimension an array.

Without tolist() you get this error

The final requirement to pass back a numpy array to Excel VBA is to use tolist() because numpy array are not convertible to an OLE Variant but lists are. If you forget to add to tolist() then you get the following error.

Unexpected Python Error: ValueError: ndarray is not C-contiguous

VBA Client Code

So now the VBA is quite simple. The Python returns an OLE Variant array of two dimensions, we calculate the dimensions and then paste onto the sheet.

Sub Test()

    Dim obj As Object
    Set obj = VBA.CreateObject("PandasInVBA.PopulationDensity")
    
    Dim v
    v = obj.getPivotTable
    
    Dim lRows As Long
    lRows = UBound(v, 1) - LBound(v, 1)
    
    Dim lColumns As Long
    lColumns = UBound(v, 2) - LBound(v, 2)
    
    Dim rng As Excel.Range
    Set rng = Sheet3.Range(Sheet3.Cells(1, 1), Sheet3.Cells(lRows, lColumns))
    rng.Value = v
End Sub

No comments:

Post a Comment