So in the previous post I gave code to detect changes in a web page and POST those changes to a Python based web server which in turn writes the payload to a folder, i.e. an HTTP based message queue. But that code was embedded in the same page as that which changed and was unrealistic. Much better if we can write a Chrome extension to sit in the browser and observe changes on other web pages. That is what we do here on this blog post.
Legal Note
Do please get permission before you start web scraping otherwise you fall foul of the law. There are legitimate use cases for this technology: imagine you are at a large company with plenty of IT teams, you've asked for a data feed but other team say you are not a priority but say feel free to web scrape.
Minimal Chrome Extension
The bare minimum to get a Chrome Extension working is one folder containing two files, that's all. The folder can be named anything. The two files are (i) content.js and (ii) manifest.json.
manifest.json
Here is an example manifest.json file
{
"name": "Clock Watch",
"version": "0.1",
"description": "example of a Mutation Observer",
"permissions": [],
"content_scripts": [ {
"js": [ "content.js" ],
"matches": [ "http://exceldevelopmentplatform.blogspot.com/2018/06/javascript-dynamic-blog-clock.html" ] }
],
"manifest_version": 2
}
So much of the manifest is boilerplate but one thing to note of interest in the matches array which tells what pages to run extension over. I have published the clock code to a separate blog post, http://exceldevelopmentplatform.blogspot.com/2018/06/javascript-dynamic-blog-clock.html and we'll use that as a laboratory test page. In the matches array one can give a selection of pages, here we only have one.
content.js
This is the content.js file with the code pretty much unchanged from the previous post; all that is added is an IIFE (Immediately Invoked Function Expression) which serves as an entry point, i.e. code that runs first. Also, we have a try catch block around our MutationObserver code to help debugging.
~function () {
'use strict';
console.log("clock watch iife running");
setTimeout(startObserving,1000);
}();
function startObserving() {
'use strict';
try {
console.log("entering startObserving");
var MutationObserver = window.MutationObserver || window.WebKitMutationObserver || window.MozMutationObserver;
if (MutationObserver == null)
console.log("MutationObserver not available");
// mutation observer code from https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver
var targetNode = document.getElementById('clock');
// Options for the observer (which mutations to observe)
var config = { attributes: true, childList: true };
// Callback function to execute when mutations are observed
var callback = function (mutationsList) {
for (var mutation of mutationsList) {
//debugger;
//console.log(mutation); //uncomment to see the full MutationRecord
var shorterMutationRecord = "{ target: div#clock, newData: " + mutation.addedNodes[0].data + " }"
console.log(shorterMutationRecord);
var xhr = new XMLHttpRequest();
xhr.open("POST", "http://127.0.0.1:8000");
//xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
xhr.send(shorterMutationRecord);
}
};
// Create an observer instance linked to the callback function
var observer = new MutationObserver(callback);
// Start observing the target node for configured mutations
observer.observe(targetNode, config);
// Later, you can stop observing
//observer.disconnect();
}
catch(err) {
console.log("err.message: "+ err.message);
}
}
Installing the Extension
In the Chrome browser navigate to chrome://extensions/. Click on "Load Unpacked" and navigate to the folder containing your two files. In my case N:\Clock Watch Chrome Extension\. Then your extension is loaded and should appear.
You can go look at the details page if you want but we are pretty much done. All you need do now is to navigate to the clock page. You'll know if you extension is loaded because a new icon appears in the top right of the Chrome browser window, on the toolbar. In the absence of a given icon, Chrome will take the first letter of your extension and use that as an icon, so below ringed in green in the "C" icon, hover over that and it will read "Clock Watch". Click on the icon and one can remove from menu if you want.
Screenshots- The Clock and the Message Queue
As highlighted in the previous post we have some code which runs a Python web server, taking HTTP POST calls and writing the payloads to a folder. Here is a screenshot to show that working
Final Thoughts
So what have we achieved here? We have a Chrome Extension which observes a page and reports the changes to a message queue by calling out with XmlHttpRequest to a Python web server. Cool but do please use responsibly.
No comments:
Post a Comment