HTML was initially conceived to be like XML in that for every opening tag there is a closing tag and the attributes are enclosed in quotes but in reality it breaks these rules and can rarely be used with an Xml parser. So XML is fussy and HTML is not.
However, take a look at the following code; it uses the XmlHttp request (XHR) object but we should note that it never parses the response as Xml unless you write the code (example code given in separate function). I think this is nice use of XHR. The code goes on to insert the response text as Html into a MSHTML.HTMLDocument and from there can web scrape whatever.
Sub DoNotParseXml()
Dim oXHR As MSXML2.XMLHTTP60
Set oXHR = New MSXML2.XMLHTTP60
Dim oHtmlDoc As MSHTML.HTMLDocument
Set oHtmlDoc = New MSHTML.HTMLDocument
oXHR.Open "GET", "https://coinmarketcap.com/all/views/all/" & "?Random=" & Rnd() * 100, False
oXHR.setRequestHeader "Content-Type", "text/XML"
oXHR.send
If oXHR.Status = "200" Then
'* no parse of 'non well-formed xml' take place
oHtmlDoc.body.innerHTML = oXHR.responseText
'** do some web scraping with MSHTML.HTMLDocument
'... oHtmlDoc.getElementsByClassName("price")
'* but if we had tried to parse the response text .. it would have errored
ParseXml oXHR.responseText
End If
End Sub
Private Function ParseXml(ByVal sText As String) As MSXML2.DOMDocument60
Dim oDom As MSXML2.DOMDocument60
Set oDom = New MSXML2.DOMDocument60
oDom.LoadXML sText
'* it would have errored
Debug.Assert oDom.parseError = 0
End Function
No comments:
Post a Comment