Tuesday, 19 June 2018

SVG - VBA - Extracting Path Data

So in the last few posts I have been travelling towards a solution that allows code to scrape data from a Bank Of England PDF. I have got so far as to break up the PDF into separate SVG files. SVG files are easier to work with because they are Xml based.

XPath in VBA

So my first language is VBA and I can quickly give some test code to demonstrate the XPath logic before I delve into a Python solution

Sub TestXml()
    '*Tools->References->Microsoft XML, v6.0
    Dim xml As MSXML2.DOMDocument60
    Set xml = New MSXML2.DOMDocument60
    
    xml.setProperty "SelectionNamespaces", "xmlns:svg='http://www.w3.org/2000/svg'"
    xml.Load "C:\Users\Simon\Downloads\pdf_skunkworks\inflation-report-may-2018-page6.svg"
    
    Debug.Assert xml.parseError.ErrorCode = 0
    
    Dim xmlBluePaths As MSXML2.IXMLDOMNodeList
    Set xmlBluePaths = xml.SelectNodes("//svg:path[@style='fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none']")
    
    Debug.Assert xmlBluePaths.Length = 28
    
    Dim xmlRedPaths As MSXML2.IXMLDOMNodeList
    Set xmlRedPaths = xml.SelectNodes("//svg:path[@style='fill:#a80c3d;fill-opacity:1;fill-rule:nonzero;stroke:none']")
    
    Debug.Assert xmlRedPaths.Length = 28
    
    Dim xmlGreyPaths As MSXML2.IXMLDOMNodeList
    Set xmlGreyPaths = xml.SelectNodes("//svg:path[@style='fill:#a98b6e;fill-opacity:1;fill-rule:nonzero;stroke:none']")
    
    Debug.Assert xmlGreyPaths.Length = 28

    Dim xmlElement As MSXML2.IXMLDOMElement
    Set xmlElement = xmlBluePaths.Item(0)
    
    Debug.Print xmlElement.xml
    Debug.Print xmlElement.getAttribute("d")

End Sub

The next problem however is how to parse the path data which can be found in the d attribute of a path element, here is an example of an element...

<path xmlns="http://www.w3.org/2000/svg" id="path670" style="fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none" d="m 241.666,133.557 h 2.364 v -25.886 h -2.364 z"/>

Within that element one can see the path data packed into the d attribute...

m 241.666,133.557 h 2.364 v -25.886 h -2.364 z

So we need code to parse this path data. But I am not going to give that code in VBA, instead I have a Python library to show you, see next post.

No comments:

Post a Comment