Monday 25 June 2018

Python - SVG - Extract and Parse Path Data from d attribute

Introduction

SVG draw shapes using a path language with commands such as moveto x1,y1; lineto x2,y2; lineto x3,y3; lineto x4,y4 then closepath. All of this is packed into a SVG Path element's d attribute. Python has a library, svg.path to parse these commands.

Background

Ok, so previous post I gave VBA code to extract some shapes from a SVG file converted from a PDF file (in the name of extracting the underlying data point) but whilst VBA has an Xml library it does not have a library to parse the d attribute. So we'll switch into Python. Besides, its Python month on this blog and so I'm meant to be reviewing and introducing useful and interesting Python libraries.

Demonstration of parsing d attribute with svg.path

So install the code (from and admin rights command console) with...

pip install svg.path

Run Python.exe to get into Python environment and enter the following statements (responses are also shown, and indented)

C:\Users\Simon>python
Python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from svg.path import Path,Line,Arc
>>> from svg.path import parse_path
>>> parse_path('m 241.666,133.557 h 2.364 v -25.886 h -2.364 z')
Path(Line(start=(241.666+133.557j), end=(244.03+133.557j)), 
     Line(start=(244.03+133.557j), end=(244.03+107.671j)), 
     Line(start=(244.03+107.671j), end=(241.666+107.671j)), 
     Line(start=(241.666+107.671j), end=(241.666+133.557j)), closed=True)
>>>

So we can see the path being parsed into a sequence of Line objects each with their own start and end co-ordinate pairs.

Python program to process paths

So now we can write some code to extract the height of the rectangle (which represents the underlying data point).

from lxml import etree
from svg.path import Path,Line
from svg.path import parse_path

sFileName = 'C:/Users/Simon/Downloads/pdf_skunkworks/inflation-report-may-2018-page6.svg'

tree=etree.parse(sFileName)

xpath = r"//svg:path[@style='fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none']"

#print (xpath)
bluePaths = tree.xpath(xpath,namespaces={   'svg': "http://www.w3.org/2000/svg"  })

for bluePath in bluePaths:
    parsed=parse_path (bluePath.attrib['d'])
    secondLine = parsed[1]
    print (secondLine.end.imag - secondLine.start.imag) #outputs the height

No comments:

Post a Comment