Use case background
I'm worried about Corona Virus and its impact on stock markets, I very much need to query the London Stock Exchange database of securities to look for some safe government bonds to buy into. I have a list of government bonds from the branch of government that manages the issuance of such bonds but their unique identifier is an ISIN which is unique across the globe. Unfortunately, stockbrokers do not use ISIN numbers and I need to find alternative IDs and ticker codes. The London Stock Exchange (I am UK resident/citizen) has this information so I need to cross-reference the issuance authority's ISINs code with the LSE's database to get codes to present to my stock broker. For this I choose to write a web scraping program for I do not know of an official REST service.Say No to VBA Webscraping
There was a time on this blog when I shared code that allowed a VBA developer to create an instance of instance of Internet Explorer, navigate to a web page and then traverse the HTML DOM to extract the necessary information. Those days have past now for a number of reasons. Firstly, Internet Explorer is legacy as Microsoft have the Edge browser so it would be foolish to write code against Internet Explorer. Secondly, since I branched out into alternatives to VBA I have discovered the quality of other libaries such as the .NET ecosystem and the Python ecosystem which both have many-fold better libraries.I'd urge no new development in VBA for webscraping as there are three or four better options depending on your architecture and programming language preferences:
- Chrome Extension, embedded JavaScript is best placed to traverse an HTML Dom and even sink events (needs a web server to relay info to though)
- .NET Html Agility Pack, so C# developers would recommends this.
- Python's Beautiful Soup
- Webdriver interface, formerly known as Selenium
I do not wish to replicate the Beautiful Soup documentation so I will be brief. Initially, I felt frustrated because I like to use a CSS selector path such as the following but this didn't not work for me which is a shame as it is a nice one shot declarative way to access an element.
#contentIndex > div.search_results_list > table > tbody > tr > td:nth-child(1)
But, I then read the documentation and saw how easily it was to script against a DOM (and actually a DOM fragment as well!) that the resulting code was perfectly easy to write.All the web-scraping is packed into a class called LSEWebServices. But I also promised a COM wrapper class and that is LSEWebServicesCOMWrapper which simply delegates to an instance of LSEWebServices; I guess I could have inherited perhaps.
Important note: this code has been updated because London Stock Exchange changes their website
import urllib.request as urllib2
from bs4 import BeautifulSoup
import pythoncom
import logging
class LSEWebServices:
def ExtractLSESearchResults(self,searchTerm: str):
try:
req = urllib2.urlopen("https://www.londonstockexchange.com/search?searchtype=all&q=" + searchTerm)
html = req.read().decode("utf-8")
try:
soup = BeautifulSoup(html, 'html.parser')
searchResultList = soup.find_all('div','item-separator')
if (len(searchResultList)==1):
subText = str(searchResultList[0])
soup2 = BeautifulSoup(subText, 'html.parser')
instrumentTidm = soup2.find('span','instrument-tidm').text
instrumentDesc = soup2.find_all('span','instrument-uppercase') [1].text
instrumentLink = soup2.find('a','tidm-and-description').attrs['href']
return (instrumentTidm, instrumentDesc,instrumentLink)
pass
except Exception as ex:
print(ex)
except Exception as ex:
print(ex)
def ExtractLSESecurityInformation(self, link:str):
try:
req = urllib2.urlopen(link)
html = req.read().decode("utf-8")
soup = BeautifulSoup(html, 'html.parser')
instrumentInformation = soup.find_all('div','chart-table-instrument-information')
if (len(instrumentInformation)==1):
subText = str(instrumentInformation[0])
soup2 = BeautifulSoup(subText, 'html.parser')
spanMarketSegmentParent = soup2.find('span',string=' Market segment ').parent()
marketSegment = spanMarketSegmentParent[1].text
spanSEDOLParent = soup2.find('span',string=' SEDOL ').parent()
SEDOL = spanSEDOLParent[1].text
spanMICParent = soup2.find('span',string=' Market identifier code (MIC) ').parent()
MIC = spanMICParent[1].text
spanISINParent = soup2.find('span',string=' ISIN ').parent()
ISIN = spanISINParent[1].text
return (marketSegment, MIC, SEDOL, ISIN)
except Exception as ex:
print(ex)
def TestLSEWebServices():
lse = LSEWebServices()
results = lse.ExtractLSESearchResults("gb0031790826")
print(results)
sec_info = lse.ExtractLSESecurityInformation(results[2])
print(sec_info)
dummy = 1 # a line of code upon which I can place a breakpoint
pass
class LSEWebServicesCOMWrapper(object):
_reg_clsid_ = "{81F3D23E-83E5-42DF-96E8-5042933379CF}"
_reg_progid_ = 'PythonInVBA.LSEWebServicesCOMWrapper'
_reg_clsctx_ = pythoncom.CLSCTX_LOCAL_SERVER
_public_methods_ = ['ExtractLSESearchResults','ExtractLSESecurityInformation']
def ExtractLSESearchResults(self,searchTerm: str):
try:
lse = LSEWebServices()
results = lse.ExtractLSESearchResults(searchTerm)
if (results is not None):
return list(results)
else:
return None
except Exception as ex:
print(ex)
def ExtractLSESecurityInformation(self, link:str):
try:
lse = LSEWebServices()
results = lse.ExtractLSESecurityInformation(link)
if (results is not None):
return list(results) #.tolist()
else:
return None
except Exception as ex:
print(ex)
def TestLSEWebServicesCOMWrapper():
lse = LSEWebServicesCOMWrapper()
results = lse.ExtractLSESearchResults("gb0031790826")
print(results)
sec_info = lse.ExtractLSESecurityInformation(results[2])
print(sec_info)
dummy = 1
pass
def RegisterThis():
print("Registering COM servers...")
import win32com.server.register
win32com.server.register.UseCommandLine(LSEWebServicesCOMWrapper)
if __name__ == '__main__':
#TestLSEWebServicesCOMWrapper()
RegisterThis()
Client VBA Code
So the Python code needs to be run once to register the COM servers and for this you need Administrator rights; once run the following VBA code can run.All the hard work has been done for us in the Python code. In VBA, we create the COM wrapper class and call a couple of methods, the Python returns Python lists which are converted into COM/VBA variant arrays and we just pull out the relevant item. Simple.
Option Explicit
Sub TestLSEWebServicesCOMWrapper()
Dim obj As Object
Set obj = VBA.CreateObject("PythonInVBA.LSEWebServicesCOMWrapper")
Dim vResult As Variant
vResult = obj.ExtractLSESearchResults("gb0031790826")
'Stop
Dim vSecInfo As Variant
vSecInfo = obj.ExtractLSESecurityInformation(vResult(2))
Debug.Print "ISIN:" & vSecInfo(3), vResult(1), "TIDM:" & vResult(0), "SEDOL:" & vSecInfo(2), ""
'Stop
End Sub
And the above code gives the outputISIN:GB0031790826 UNITED KINGDOM 2% IL TREASURY 35 TIDM:T2IL SEDOL:3179082
So now referring back to the use case I can verify that my stock broker recognizes T2IL and 3179082 as identifiers. So we have converted the ISIN into something useful.
Very nice code!!
ReplyDeletequestion:
1. Dim obj As Object
Set obj = VBA.CreateObject("PythonInVBA.LSEWebServicesCOMWrapper")
i am getting error while setting this object. Where to find proper library?