So it's Python month on this Excel Development Platform blog where I highlight some Python technologies of interest to Excel (VBA) Developers.
A few days prior I blogged a post about using Pandas to process some very large Eurostat Population Density data set, once I got the Pandas pivot table I saved it to a csv file which is unsatisfactory; the truth is I was having difficulty returning the Pandas pivot table to VBA as a 2d OLE Variant that can be pasted onto a worksheet. In this post I fix that.
Python code to return an OLE Variant from a Pandas Pivot
First the Python code. So the upper half of this code is the same as in the previous pandas article. So this task involves extracting portions of the pivot table, converting them to numpy arrays and then glueing them into one large 2d array.
import pandas as pd
import numpy as np
class PopulationDensity(object):
#"{C50910CC-F88F-4EA5-86D4-1E5D6AF1F4AE}"
_reg_clsid_ = "{C50910CC-F88F-4EA5-86D4-1E5D6AF1F4AE}"
_reg_progid_= 'PandasInVBA.PopulationDensity'
_public_methods_ = ['getPivotTable']
def getPivotTable(self):
## csv file in zip http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip
url="C:/Users/Simon/Downloads/GEOSTAT-grid-POP-1K-2011-V2-0-1/Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
whole=pd.read_csv(url, low_memory=False)
## only want two columns
populationDensity=whole[['TOT_P','CNTR_CODE']]
## trying to replicate graph here http://www.centreforcities.org/wp-content/uploads/2018/04/18-04-16-Square-kilometre-units-of-land-by-population.png
## which aggregates the records by brackets
# https://stackoverflow.com/questions/25010215/pandas-groupby-how-to-compute-counts-in-ranges#answer-25010952
# also relevant https://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values#answer-21441621
ranges = [0,10000,15000,20000,25000,30000,35000,40000,45000,1000000]
## Yes groupby but we want to work with a DataFrame afterwards
# https://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe
groupedByBracketandCountry = populationDensity.groupby(['CNTR_CODE',pd.cut(populationDensity['TOT_P'],ranges).astype(str)]).size().to_frame(name = 'count').reset_index()
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html#pandas.DataFrame.pivot_table
pivottable2=pd.pivot_table(groupedByBracketandCountry, fill_value=0, values='count', index='TOT_P',columns=['CNTR_CODE'] , aggfunc=np.sum)
# new code here that extracts portions of pivot table into arrays and then glues then together into one returnable array
pivottable2Values = pivottable2.values # this gives the data elements of the pivot table
pivottable2Columns = pivottable2.columns.values # this gives the column headers of the pivot table
# the next line glues one 2d array on top of another (i.e. vertically)
pivottable2ColumnsAndValues = np.vstack((pivottable2Columns,pivottable2Values))
# this will be our origin cell, in the top left
pythontopLeft=np.zeros((1,1))
# this gets our rows headers in a 2d array, np.reshape upgrades the 1d array to 2d
indexValues = np.reshape(pivottable2.index.values, (-1, 1))
# this next line glues one 2d array on top of another (i.e. vertically)
pivottable2Index = np.vstack((pythontopLeft,indexValues))
# this next line glues one 2d array onto the side of another (i.e. horizontally)
vRet = np.hstack((pivottable2Index,pivottable2ColumnsAndValues))
# this final line helps the conversion to an OLE Variant pastable onto an Excel worksheet
return vRet.tolist()
if __name__=='__main__':
print ("Registering COM server...")
import win32com.server.register
win32com.server.register.UseCommandLine(PopulationDensity)
np.vstack, np.hstack and np.reshape
There are some interesting numpy array functions introduced here np.vstack, np.hstack and np.reshape; the first two glue two array either vertically or horizontally; np.reshape can be used to redimension an array.
Without tolist() you get this error
The final requirement to pass back a numpy array to Excel VBA is to use tolist() because numpy array are not convertible to an OLE Variant but lists are. If you forget to add to tolist() then you get the following error.
Unexpected Python Error: ValueError: ndarray is not C-contiguous
VBA Client Code
So now the VBA is quite simple. The Python returns an OLE Variant array of two dimensions, we calculate the dimensions and then paste onto the sheet.
Sub Test()
Dim obj As Object
Set obj = VBA.CreateObject("PandasInVBA.PopulationDensity")
Dim v
v = obj.getPivotTable
Dim lRows As Long
lRows = UBound(v, 1) - LBound(v, 1)
Dim lColumns As Long
lColumns = UBound(v, 2) - LBound(v, 2)
Dim rng As Excel.Range
Set rng = Sheet3.Range(Sheet3.Cells(1, 1), Sheet3.Cells(lRows, lColumns))
rng.Value = v
End Sub
No comments:
Post a Comment