PhantomJS alternative: Write short PyQt scripts instead (phantom.py)

For a project of mine, I needed a ‘headless’ script that renders dynamically generated HTML (via JavaScript) to a PDF file. In looking for existing headless browsers, I found PhantomJS, CasperJS and similar projects.

PhantomJS looked most promising, but it had bugs related to the CSS @media type ‘print’ which made the project useless for generating properly formatted PDFs.[1][2] These issues are still open after three years. As of 2017, the official PhantomJS binary download is still at version 2.1 which includes an older downstream QtWebKit version 5.5.1 [3].

In trying to fix the bugs myself, or upgrade QtWebKit, I spent two full days trying to build PhantomJS, but eventually had to give up because the build system seems to be reworked right now. It seems that instead of patching the custom QtWebKit, the development team is moving towards making “PhantomJS a normal Qt application with system-installed Qt and QtWebKit”. [4]

So, I started questioning PhantomJS. What magic does it really do? By inspecting the source code, I found out that its entire concept hinges on just two Qt functions: QWebFrame::addToJavaScriptWindowObject() and QWebFrame::evaluateJavascript(). The former allows exposing Qt objects to the JavaScript space of WebKit. For example, PhantomJS’s JavaScript API method injectJs is simply a wrapper for an underlying evaluateJavascript call inside of a C++ function of a C++ object which has been previously attached via addToJavaScriptWindowObject.

In short, PhantomJS’s custom (and in my opinion rather sparse) JavaScript API simply calls Qt’s C++ functions.

The obvious disadvantage of this concept is that an extension or fixing of the API requires a recompilation of the C++ code.

It is obviously much simpler to directly script Qt, for example via PyQt.

So, I started writing a short PyQt application, and after just 90 lines of Python code, I had what I needed: a headless browser using an up-to-date version of WebKit, which did not have the shortcomings of the version in PhantomJS.

The script is published on my blog and as a Github gist. I don’t expect getting 20,000 Github stars like others, but honestly, following (perhaps too simple) logic, my phantom.py script should get more. 😉

phantom.py: A lean replacement for bulky headless browser frameworks

 

References:

[1]: https://github.com/ariya/phantomjs/issues/12685

[2]: https://github.com/ariya/phantomjs/issues/13524

[3]: https://github.com/ariya/phantomjs/tree/2.1.1/src/qt

[4]: https://github.com/ariya/phantomjs/issues/14458#issuecomment-242391757

,

No comments yet.

Leave a Reply

Powered by WordPress. Designed by Woo Themes