So last post I wrote Python class to decompile a *.chm compiled help file. Found within is what looks like HTML 3.2 that be should upgraded to either XHTML or HTML5. I had written some VBA code to do this but since Python month on this blog I am keen to find out what a Python developer would do. They would (I should imagine) use the library https://pypi.org/project/pytidylib/ which wraps the venerable HTML Tidy.
pip install pytidylib does not install HTML Tidy
So one installs pytidylib from a command window with admin rights using
pip install pytidylib
C:\Users\Simon\source\repos\foo\bar>pip install pytidylib
Collecting pytidylib
Downloading https://files.pythonhosted.org/packages/2d/5e/4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0/pytidylib-0.3.2.tar.gz (87kB)
100% |████████████████████████████████| 92kB 1.4MB/s
Installing collected packages: pytidylib
Running setup.py install for pytidylib ... done
Successfully installed pytidylib-0.3.2
And Using Visual Studio I run a small example program to test the install
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print (document)
print (errors)
But unfortunately it complains of not being able to find libtidy which indicates HTML Tidy is not installed for you.
Here is the stack trace
OSError
Message=Could not load libtidy using any of these names: libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib,libtidy.dylib,tidy
StackTrace:
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:99 in Tidy.__init__
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:234 in get_module_tidy
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:222 in tidy_document
C:\Users\Simon\source\repos\CompiledHelpToEbookPythonApp\CompiledHelpToEbookPythonApp\HtmlTidy.py:3 in
Install HTML Tidy Binaries
It is required to install the HTML Tidy Binaries separately. I got mine from http://binaries.html-tidy.org/. Initially, I took the 32-bit edition which was a mistake and the error persisted. So I took the 64-bit edition, I downloaded tidy-5.6.0-vc14-64b.zip, extracted it and then added the extracted bin folder to my path. Don't forget to restart processes for the environment variables changes to be picked up.
After Successful Install
After successful install this is what is output from the sample program above.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<title></title>
</head>
<body>
<p>fõo <img src="bar.jpg">
</body>
</html>
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Press any key to continue . . .
python chrome extensions
ReplyDelete