Tag Archives | PDF

phantom.py: A lean replacement for bulky headless browser frameworks

This is a simple but fully scriptable headless QtWebKit browser using PyQt5 in Python3, specialized in executing external JavaScript and generating PDF files. A lean replacement for other bulky headless browser frameworks. (Source code at end of this post as well as in this github gist) Usage If you have a display attached:

If […]

Continue Reading 2
Digitizing books 
(International Dunhuang Project, CC BY-SA 3.0)

Digitize books: Searchable OCR PDF with text overlay from scanned or photographed books on Linux

Here is my method to digitize books. It is a tutorial about how to produce searchable, OCR (Optical Character Recognition) PDFs from a hardcopy¬†book using free software tools on Linux distributions. You probably can find more convenient proprietary software, but that’s not the¬†objective of this post. Important: I should not need to mention that depending […]

Continue Reading 2

Powered by WordPress. Designed by Woo Themes