Friday 20 July 2018

Python - HTML - pytidylib does not install HTML Tidy

So last post I wrote Python class to decompile a *.chm compiled help file. Found within is what looks like HTML 3.2 that be should upgraded to either XHTML or HTML5. I had written some VBA code to do this but since Python month on this blog I am keen to find out what a Python developer would do. They would (I should imagine) use the library https://pypi.org/project/pytidylib/ which wraps the venerable HTML Tidy.

pip install pytidylib does not install HTML Tidy

So one installs pytidylib from a command window with admin rights using

pip install pytidylib
C:\Users\Simon\source\repos\foo\bar>pip install pytidylib
Collecting pytidylib
  Downloading https://files.pythonhosted.org/packages/2d/5e/4d2b5e2d443d56f444e2a3618eb6d044c97d14bf47cab0028872c0a468e0/pytidylib-0.3.2.tar.gz (87kB)
    100% |████████████████████████████████| 92kB 1.4MB/s
Installing collected packages: pytidylib
  Running setup.py install for pytidylib ... done
Successfully installed pytidylib-0.3.2

And Using Visual Studio I run a small example program to test the install

from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print (document)
print (errors)

But unfortunately it complains of not being able to find libtidy which indicates HTML Tidy is not installed for you.

Here is the stack trace

OSError
  Message=Could not load libtidy using any of these names: libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib,libtidy.dylib,tidy
  StackTrace:
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:99 in Tidy.__init__
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:234 in get_module_tidy
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\tidylib\tidy.py:222 in tidy_document
C:\Users\Simon\source\repos\CompiledHelpToEbookPythonApp\CompiledHelpToEbookPythonApp\HtmlTidy.py:3 in 

Install HTML Tidy Binaries

It is required to install the HTML Tidy Binaries separately. I got mine from http://binaries.html-tidy.org/. Initially, I took the 32-bit edition which was a mistake and the error persisted. So I took the 64-bit edition, I downloaded tidy-5.6.0-vc14-64b.zip, extracted it and then added the extracted bin folder to my path. Don't forget to restart processes for the environment variables changes to be picked up.

After Successful Install

After successful install this is what is output from the sample program above.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
  <head>
    <title></title>
  </head>
  <body>
    <p>fõo <img src="bar.jpg">
  </body>
</html>

line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element

Press any key to continue . . .

1 comment: