Thursday, March 28, 2013

Ubuntu OCR Solution.

Since I am transferred to NOIDA office of the company there is mostly office work for me except occasional visit to a power station.

I was looking for OCR solution to convert scanned PDF documents to text files. Initially I tried pdfocr and tesseract command line tools but not much success.

Then I converted one page online at ABBYY FineReader The site allowed me to convert 3 pages for free and afterwards I had to pay.

I discovered this page about Linux OCR solution. I downloaded the .deb file and installed on Ubuntu 12.04. It installed without any dependency problem since I had tesseract already installed.

Actually Lios is a GUI using cuneiform/tesseract engine in the background. I had already tried pdfocr which uses cuneiform and tesseract through command line and not hoping to get good results but Lios worked much better.

I used cuneiform engine for normal scanned page and tesseract engine if there was a table on the page. It takes time if there is a table but tesseract extracts the text correctly.

Air India direct flight to San Fransisco has flown through China today.

 My sister in law left for San Fransisco by AI 173 flight which flows over North Pole. I tracked that flight on flightstats.com till it land...