Saturday, 3 November 2018

Python - VBA - Grouping Data From a Skewed Normal Distribution

Use Python to get a population from a skewed normal distribution, group them by bins (supplied as an argument) and return as a matrix back to VBA.

So I hadn't used Python since Python month on this blog this summer. A StackOverflow VBA bounty question came up which though did not ask for Python/Pandas looked eminently doable in Python/Pandas. So I wrote some code and got help on the Python StackOverflowers. Here is the result, the Python class looks like this

import numpy as np
import pandas as pd
from scipy.stats import skewnorm


class PythonSkewedNormal(object):
    _reg_clsid_ = "{1583241D-27EA-4A01-ACFB-4905810F6B98}"
    _reg_progid_ = 'SciPyInVBA.PythonSkewedNormal'
    _public_methods_ = ['GeneratePopulation', 'BinnedSkewedNormal']

    def GeneratePopulation(self, a, sz):
        # https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
        np.random.seed(10)
        # https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
        return skewnorm.rvs(a, size=sz).tolist()

    def BinnedSkewedNormal(self, a, sz, bins):
        # https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
        np.random.seed(10)
        # https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
        pop = skewnorm.rvs(a, size=sz)
        bins2 = np.array(bins)
        bins3 = pd.cut(pop, bins2)

        table = pd.value_counts(bins3, sort=False)

        table.index = table.index.astype(str)

        return table.reset_index().values.tolist()

if __name__ == '__main__':
    print("Registering COM server...")
    import win32com.server.register
    win32com.server.register.UseCommandLine(PythonSkewedNormal)

and the calling VBA code looks like this...

Option Explicit

Sub TestPythonSkewedNormal()

    Dim skewedNormal As Object
    Set skewedNormal = CreateObject("SciPyInVBA.PythonSkewedNormal")

    Dim lSize As Long
    lSize = 100

    Dim shtData As Excel.Worksheet
    Set shtData = ThisWorkbook.Worksheets.Item("Sheet3") '<--- change sheet to your circumstances
    shtData.Cells.Clear

    Dim vBins
    vBins = Array(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)

    'Stop
    Dim vBinnedData
    vBinnedData = skewedNormal.BinnedSkewedNormal(-5, lSize, vBins)

    Dim rngData As Excel.Range
    Set rngData = shtData.Cells(2, 1).Resize(UBound(vBins) - LBound(vBins), 2)

    rngData.Value2 = vBinnedData

    'Stop

End Sub

And the output looks like this

(-5, -4]         0
(-4, -3]         0
(-3, -2]         4
(-2, -1]        32
(-1, 0]         57
(0, 1]           7
(1, 2]           0
(2, 3]           0
(3, 4]           0
(4, 5]           0

There is more to the question but this should illustrate that the Python libraries have an advantage over Excel worksheet.

3 comments:

  1. thanks! it works fine for Excel 32bit. But why it can not work for Excel x64?
    It generates '-2147024770 Automation error' at line 'Set skewedNormal = CreateObject("SciPyInVBA.PythonSkewedNormal")'
    Possibly, some tricks needed to run under Excel VBA x64?

    ReplyDelete
  2. How do you generate keys to _reg_clsid_?

    ReplyDelete
  3. I have posted How to create guid in Visual Studio https://exceldevelopmentplatform.blogspot.com/2019/11/how-to-create-guid-in-visual-studio.html

    ReplyDelete