Introduction
SVG draw shapes using a path language with commands such as moveto x1,y1; lineto x2,y2; lineto x3,y3; lineto x4,y4 then closepath. All of this is packed into a SVG Path element's d attribute. Python has a library, svg.path to parse these commands.
Background
Ok, so previous post I gave VBA code to extract some shapes from a SVG file converted from a PDF file (in the name of extracting the underlying data point) but whilst VBA has an Xml library it does not have a library to parse the d attribute. So we'll switch into Python. Besides, its Python month on this blog and so I'm meant to be reviewing and introducing useful and interesting Python libraries.
Demonstration of parsing d attribute with svg.path
So install the code (from and admin rights command console) with...
pip install svg.path
Run Python.exe to get into Python environment and enter the following statements (responses are also shown, and indented)
C:\Users\Simon>python
Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from svg.path import Path,Line,Arc
>>> from svg.path import parse_path
>>> parse_path('m 241.666,133.557 h 2.364 v -25.886 h -2.364 z')
Path(Line(start=(241.666+133.557j), end=(244.03+133.557j)),
Line(start=(244.03+133.557j), end=(244.03+107.671j)),
Line(start=(244.03+107.671j), end=(241.666+107.671j)),
Line(start=(241.666+107.671j), end=(241.666+133.557j)), closed=True)
>>>
So we can see the path being parsed into a sequence of Line objects each with their own start and end co-ordinate pairs.
Python program to process paths
So now we can write some code to extract the height of the rectangle (which represents the underlying data point).
from lxml import etree
from svg.path import Path,Line
from svg.path import parse_path
sFileName = 'C:/Users/Simon/Downloads/pdf_skunkworks/inflation-report-may-2018-page6.svg'
tree=etree.parse(sFileName)
xpath = r"//svg:path[@style='fill:#19518b;fill-opacity:1;fill-rule:nonzero;stroke:none']"
#print (xpath)
bluePaths = tree.xpath(xpath,namespaces={ 'svg': "http://www.w3.org/2000/svg" })
for bluePath in bluePaths:
parsed=parse_path (bluePath.attrib['d'])
secondLine = parsed[1]
print (secondLine.end.imag - secondLine.start.imag) #outputs the height
No comments:
Post a Comment