Page 1 of 1

New npTess NeoPlugin for NeoBook

PostPosted: Sat Sep 01, 2012 4:10 am
by dec
Hi to all!

This plugin allow you to use the Tesseract OCR (Optical Character Recognition) engine in your publications, allowing to several page segmentation and other options in order to get the best possible results from BMP, JPG, PNG, GIF and TIFF images.

You can instantiate one or more Tess objects in order to do the work, and specify several events in order to control the text recognition task process. With the plugin are included all the files needed to be deployed along your publications.

You can download the npTess from here.

PostPosted: Sat Sep 01, 2012 11:52 am
by Luiz Alfredo
Great work David.

I have downloaded the Portuguese Data File and it is working very well in a new project.

PostPosted: Sat Sep 01, 2012 11:56 am
by dec
Hello Luiz,

Thank you Luiz. So I try to download and test the portuguese file and add to the distribution if everyhing is fine.

I don't know if is a good idea to include all the available languages, but sure I add the portuguese language *.

On the other hand, the idea is to test for possible errors every language included in the distribution.

Thank you again Luiz. ;)

* Update: The file is now included along the plugin.

PostPosted: Sun Sep 02, 2012 12:13 am
by HPW
Hello David,

Do you work with the original Dll?

With Techmedias version there was a problem with windows XP and the Dll. ... 7&start=45

Is npTess tested on XP?


PostPosted: Sun Sep 02, 2012 1:08 am
by dec

Hans, I decide (after several searchs and efforts) to write my own Tesseract wrapper, but, not for any Tesseract DLL, since there is no official DLLs from the version 2.0. Maybe it is possible to build a DLL, but, this is not the case of npTess. For your information, I publish on the ClubDelphi (spanish) my wrapper for Tesseract (used by the plugin) and you can download directly from here.

On the other hand, I test the plugin on Windows 7 and Windows 8, but, cannot imagine a reason to not work on Windows XP, that is, in principle, the plugin can work also on Windows XP. The Tesseract binary, as the author said: "[...] are built with static linking, so they stand more chance of working out of the box on more windows systems." Thanks for your comments!

PostPosted: Sun Sep 02, 2012 7:12 am
by HPW
Hello David,

Thanks for the link to your component.
I test a bit with the contained Sample.exe
It seems to work on WIN XP.
Also your Dll shows no unresolved DLL-functions in Dependency-Walker.

When I run the sample a.gif the letters are recognizes quite well.
But the row with numbers and special chars are not recognize well.
Where comes the tesseract.dll from?
Is it part of the google tesseract distribution?


PostPosted: Sun Sep 02, 2012 7:19 am
by dec

As I say before, the DLL is really a portable binary executable (renamed as a DLL) comming from the latest released by the Tesseract OCR team under the Apache License 2.0. These did not publish any DLL Tesseract version 2.0. My Delphi component act over the referred executable as a transparently command line tool. Glad to know that run on Windows XP. ;)

On the other hand, sometimes the results that Tesseract are not perfect. Sometimes the images are recognized in a absolute fidelity way, and other times Tesseract did not recognize or translate bad some characters. The plugin (and Tesseract) offer to set some Page Segmentation options, and also some advanced options, which can help on this.

Also, the Tesseract Windows distribution offer tools in order to train the Tesseract engine. I think that if you train Tesseract and then get the appropiate language file, you can use with the plugin, but in fact I did not test this possibility, because in principle is out of the plugin scope. This include however some "trained" language files which can be use.

Thanks again!

PostPosted: Sun Sep 02, 2012 9:25 am
by dec

If you like, download again the Delphi component, since I make on it some enhancements and also fix some things. The same for the plugin, which is now updated too.

Thanks again!