Home > tutorials > OCR using Tesseract and ImageMagick as pre-processing task

OCR using Tesseract and ImageMagick as pre-processing task

While many applications today use direct data entry via keyboard, more and more of these will return to automated data entry. The reasons for this include the increased incidence of operator wrist problems from constant keying and the potential hazards of video display terminal emissions. Therefore any application imaginable is a candidate for OCR.

What are its Applications?

  • Automatic number plate recognition, is used by various police forces and as a method of electronic toll collection on pay-per-use roads, parking, car washing stations etc and cataloging the movements of traffic or individuals (quite popular in Central London).
  • Book scanning – digital books can be easily distributed, reproduced and read on-screen. Projects like Project Gutenberg, Google Book Search scan books on a large scale.
  • CAPTCHA – is a type pf challenge-response test used in computing as an attempt ensure that the response is not generated by a computer. Stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
  • Computational linguistics – machine translation
  • Digital pen as well as digital paper
  • Digital mail room is an automation of incoming mail processes for classification and distribution of mail.
  • Handwriting – is a person’s particular and individual style of writing with pen or pencil. Every literate human has his own manner of writing. Graphology is the controversial study and analysis of handwriting especially in relation to human psychology. Sometimes it’s a part of hiring processing, from the candidate asked to write by hand about its familiar topic and after that send to the authorities for the psycho-analysis of the person.
  • Music OCR – intended to interpret sheet music or printed scores into editable and playable form.
  • Optical Mark Recognition – is a process of capturing human marked data from document forms such as surveys and tests.
  • Kurtzwiel – text-to-speech converter software program, which enables a computer to read electronic and scanned text aloud to visually-impaired people.

Principles of OCR Technology

Optical Character Recognition (OCR) systems may recognize machine print. Using pattern-matching technology, OCR translates the shapes and patterns of machine-made characters into corresponding computer codes. Though most advanced systems are able to recognize multiple fonts, they can process only standard fonts such as Times Roman and Arial. Once all characters in a given word are recognized, the word is compared against a vocabulary of potential answers for the final result.

Character recognition then segments lines of text or words into separate characters that are recognized by the makeup of their component shapes. Machine-printed letters are evenly spaced across, and up-and-down, a given page, allowing the OCR system to read the text one character at a time. Segmentation into single characters represents a critical recognition failure point for forms processing organizations, because OCR recognition technology requires high-quality images with excellent contrast, character and clarity. Any text that is less than perfect will cause even the most sophisticated OCR systems to return significant reductions in accuracy when processing degraded images.

How to choose an optimal product?

When discussing what an OCR product to choose, the number of criteria should be considered. What a price you’re ready to pay? What’s a quality of the product? How is it supported? And so on, and so on. Fortunately for us, such a product exists. It’s open source, very good quality, pretty well supported and still alive. It called tesseract-ocr. Why tesseract? Because it’s open source, it’s licensed ASFv2, because it’s one of the best, the support is pretty well via mailing-list, runs on multiple platforms, has wide range of build-in languages, stable and easily integrates with other systems.

This tutorial divided by:

Introduction to tesseract-ocr

Installation of tesseract 3.0.1 for Windows.

Extracting the text

Writing simple tesseract function using baseapi

Writing Java function that extracts text from given image using ProcessBuilder and tesseract.exe

Introduction to ImageMagic

Installation ImageMagic 6.6.9-8 for Windows

Checking the installation

Brief description what’s under the hood, useful command line utilities.

Java API to ImageMagic (http://im4java.sourceforge.net/)

Introduction to MSL

Writing simple MSL script




Introduction to teseract-ocr

As WIKI suggests, in geometry, the tesseract, also called an 8-cell or regular octachoron or cubic prism, is the four-dimensional analog of the cube. The tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of 6 square faces, the hypersurface of the tesseract consists of 8 cubical cells. The tesseract is one of the six convex regular 4-polytopes.

In our case, the Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. Now Google takes care of it.

Tesseract Installation

During this tutorial we will use Windows box with Microsoft Visual Studio 2008 Express installed.

The installation is very simple, takes about 1 hour. You can use Ant script provided for running particular tasks or do it by yourself.

Let’s meet the tutorial requirements.

  1. Install Microsoft Visual Studio 2008 Express (http://msdn.microsoft.com/en-us/express/future/bb421473)
  2. Add vcbuild.exe to the classpath
  3. Install Ant (http://ant.apache.org/bindownload.cgi)
  4. Install SVN client (http://subversion.apache.org/packages.html)
  5. Check the Java2SE 1.5/6 installation

Now, we are ready to step in the word of image processing.

Step 1.

Download the tessearct source files and data. You have two options to do it, 1. using svn or 2. using ant script provided.

If you chosen use an Ant, check the following properties first.

  • tesseract.dir – a path to the tesseract sources to be download
  • tesseract.dir.name – a folder name, i.e. ${tesseract.dir}/${tesseract.dir.name}

Just make sure it exists, or make it yourself. mkdir ….


ant svn

Ok, time to go drink a coffee or read the news.

Well, continuing using Ant script, type:

ant build

If all went good, you will be notified that all 60 projects successfully built.

Tesseract chipped with the following list of trained languages:

  • Arabic
  • Bulgarian
  • Catalan
  • Czech
  • Chinese simplified
  • Chinese traditional
  • Danish
  • German
  • Greek
  • English
  • Finnish
  • French
  • Hebrew
  • Hindi
  • Croatian
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Lithuanian
  • Dutch
  • Norwegian
  • And more

Let’s see what we have inside.

  1. tesseract – extracts text or characters from the image.Usage: tesseract imagename outputfile -l -psm configfile-l, -psm, configfile are optional. -l means language in ISO 639-3 standard (eng, rus, ell etc). -psm means pagesegmode, the following mode are available:
    psm mode Description
    0 Orientation and script detection (OSD) only
    1 Automatic page segmentation with OSD
    2 Automatic page segmentation, but no OSD, or OCR
    3 Fully automatic page segmentation, but no OSD. (Default)
    4 Assume a single column of text of variable sizes
    5 Assume a single uniform block of vertically aligned text
    6 Assume a single uniform block of text
    7 Treat the image as a single text line
    8 Treat the image as a single word
    9 Treat the image as a single word in a circle
    10 Treat the image as a single character
  2. cntraining – generates a normproto and pffmtable. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). It then appends these samples into a separate file for each character. The name of file is: DirectoryName/FontName/CharName.FeatureTypeName. The DirectoryName can be specified via a command line argument. If not specified, it defaults to the current directory.
  3. combine_tessdata – creates an unified traineddata file from different files produced by the training process.
    Usage Description
    language_data_path_prefix (e.g. tessdata/eng.) Combines all individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs). The result will be a combined tessdata file lang_code.traineddata
    -e Extracts individual components from a combined trained data file. For instance, combine_tessdata -e tessdata/ell.traineddata
    -o Overwrites individual components of the given lang_code.traineddata file. Example:

combine_tessdata -o tessdata/ell.traineddata-uUnpacks all the components to the specified path. For instance,

combine_tessdata -u tessdata/ell.traineddata /home/$USER/temp/ell

  • mftraining – Separates training pages into files for each character. Strips from files only the features and there parameters of the feature type mf. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). The result is a binary file used by the OCR engine.
  • unicharset_extractor – Extracts a character/ligature set. Given a list of box files on the command line, generates a file containing an unicharset, a list of all the characters. The file contains the size of the set on the first line, and then one unichar per line.Usage: unicharset_extractor [-D DIRECTORY] FILE…
  • wordlist2dawg – Generates a DAWG from word list file. Given a file that contains a list of words (one word per line) and generates the corresponding squished DAWG file.Usage: wordlist2dawg [-t | -l min_len max_len] word_list_file dawg_file unicharset_file

Often, people think that with OCR they can “crack” gotchas.


As example, run the following:

tesseract.exe ..\kor_data\gotcha.tif gotchaOutput.txt -l eng

For human being it’s easy to recognize what’s written (rondity describe.), however, look at output:


It could not recognize the first word, white space. Only second word recognized perfectly. You can train you OCR be able take care of words like a first one, but that who produces such gotchas will change their algorithm and you fail again. In this case, don’t try harder.

Another example:


tesseract.exe ..\kor_data\fra.arial.g4.tif ..\kor_data\fra_output.txt -l fra

Observing the output you probably found that extracted text is quite good but not perfect. Some characters misunderstood. To fix that you need “add” these characters to the traineddata. This process well described in tesseract-ocr wiki (http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3).

In addition to batch processing tesseract-ocr makes possible integrate its capabilities with your program/product through basic c++ API. It’s well documented and easy to use. All basicapi sources located in ../api folder.

Here is an example:

#include "baseapi.h"

char* run_tesseract(const char* datapath, const char* language,
                    const unsigned char* imagedata,
                    int bytes_per_pixel, int bytes_per_line,
                    int left, int top, int width, int height) {

//Starts tesseract. Datapath must be the name of parent dir and must end in '/'.
TessBaseAPI::Init(datapath, language);

//Recognizes a rectangle from an image and returns the result as a string
char* text =
TessBaseAPI::TesseractRect(imagedata, bytes_per_pixel, bytes_per_line,
                           left, top, width, height);

//Closes down tesseract and frees up all memory

  return text;

Java code using ProcessBuilder looks like:


* Returns a text extracted from image


* @param image File, might be tiff, png or gpeg

* @param tesseractPath where a tesseract executable is located

* @param iso639_3Lang three long character String, for instance, fra

* @return extracted text


* @throws IOException

* @throws InterruptedException


publicstatic String getExtractedText(File image, String tesseractPath,

String iso639_3Lang)

throws IOException, InterruptedException


File outputFile = new File(image.getParentFile(), “output”);

StringBuffer buffer = new StringBuffer();

ProcessBuilder pb = new ProcessBuilder(tesseractPath + File.separator +

“tesseract”, image.getCanonicalPath(),


“-l”, iso639_3Lang);


Process process = pb.start();


BufferedReader in = new BufferedReader(new InputStreamReader(

new FileInputStream(outputFile.getAbsolutePath() +

“.txt”), “UTF-8”));

String str;

while ((str = in.readLine()) != null) {




new File(outputFile.getAbsolutePath() + “.txt”).delete();

return buffer.toString();


Working with OCR, often you will want to prepare your data (images) before throwing to the OCR. It could be converting image format, increase/decrease an image resolution, reduce image noise. There are a lot of options to achieve that, GIMP (http://www.gimp.org/) – is free, mutual and cute! If you trying to automate the data preparation process look at ImageMagic (http://www.imagemagick.technocozy.com/).


It can do: detect edges, add noise, capture a screen and more and more. I could not cover them here, however I’m going to cover a relevant part to the our ocr processing.

  • Format conversion;
  • Transformations;
  • Composite – not sure …
  • Image identification
  • MSL – Magic Scripting Language – not sure

Installation of the ImageMagic

IM supports wide range of platforms, from *Nix to the Windows. I suppose, throughout this tutorial you used Windows and let it be so.

If you use the Ant script provided, run:

ant im.http

This command will download the windows installer. The Ant properties are in build.properties files, change them according to your set-up.

Moreover, the MAGICK_HOMEenvironment variable should be set to the path where you previously extracted the ImageMagick files.

Verifying installation

convert logo: logo.miff
imdisplay logo.miff

ImageMagick core utilities

Utility name



Intended to view an image, manage its functionality including load, print, write to file, zoom, copy a region, paste a region, crop, show histogram and even more.


Converts image formats. Can be used for making thumbnails, charcoal drawning, oil painting, morphing


Used to capture the screen and writes it to the file. Can be specified a single window, the entire screen, or any portion of the screen


Shows animated formats or a sequence of images. Has a capability for color reduction to match the color resolution of the display.


Combines several separate images with the following schemes: Over, In, Out, Atop, Xor, Plus, Minus, Difference, Multiply and Bumpmap.


Arranges a group of images into a single image.


Applies transformations on images and unlike other utilities overwrites the result on the original image.


Magick Scripting Language (MSL), XML-based language using Conjure to perform any image processing activity without Perl interpreter.


Detects more information about an image format, such as file size, width, height, mapped color, number of colors and can detect if an image is corrupted.

ImageMagick has unbelievable number of interfaces, you can choose whatever you want. In this tutorial we will use Java API – im4java (http://im4java.sourceforge.net/).

Convert usage, options and image operators

Usage: convert.exe [options …] file [ [options …] file …] [options …] file

Options – Image Settings:


joins images into a single multi-image file

-affine matrix

affine transform matrix

-alpha option

activates, deactivates, resets, or sets the alpha channel


removes pixel-aliasing

-authenticate password

deciphers image with this password

-attenuate value

lessens (or intensify) when adding noise to an image

-background color

background color

-bias value

adds bias when convolving an image


uses black point compensation

-blue-primary point

chromaticity blue primary point

-bordercolor color

border color

-caption string

assigns a caption to an image

-channel type

applies option to select image channels

-colors value

preferred number of colors in the image

-colorspace type

alternates image colorspace

-comment string

annotates image with comment

-compose operator

sets image composite operator

-compress type

type of pixel compression when writing the image

-define format:option

defines one or more image format options

-delay value

displays the next image after pausing

-density geometry

horizontal and vertical density of the image

-depth value

image depth

-direction type

renders text right-to-left or left-to-right

-display server

gets image or font from this X server

-dispose method

layers disposal method

-dither method

applies error diffusion to image

-encoding type

text encoding type

-endian type

endianness (MSB or LSB) of the image

-family name

renders text with this font family

-fill color

color to use when filling a graphic primitive

-filter type

uses this filter when resizing an image

-font name

renders text with this font

-format “string”

output formatted image characteristics

-fuzz distance

colors within this distance are considered equal

-gravity type

horizontal and vertical text placement

-green-primary point

chromaticity green primary point

-intent type

type of rendering intent when managing the image color

-interlace type

type of image interlacing scheme

-interline-spacing value

sets the space between two text lines

-interpolate method

pixel color interpolation method

-interword-spacing value

sets the space between two words

-kerning value

sets the space between two letters

-label string

assigns a label to an image

-limit type value

pixel cache resource limit

-loop iterations

adds Netscape loop extension to your GIF animation

-mask filename

associates a mask with the image

-mattecolor color

frame color


Monitors progress

-orient type

image orientation

-page geometry

size and location of an image canvas (setting)


efficiently determines image attributes

-pointsize value

font point size

-precision value

maximum number of significant digits to print

-preview type

image preview type

-quality value

JPEG/MIFF/PNG compression level


suppresses all warning messages

-red-primary point

chromaticity red primary point


Pays attention to warning messages

-remap filename

Transforms image colors to match this set of colors


settings remain in effect until parenthesis boundary

-sampling-factor geometry

horizontal and vertical sampling factor

-scene value

image scene number

-seed value

Seeds a new sequence of pseudo-random numbers

-size geometry

width and height of image

-stretch type

renders text with this font stretch

-stroke color

graphic primitive stroke color

-strokewidth value

graphic primitive stroke width

-style type

Renders text with this font style


synchronize image to storage device


Declares the image as modified

-texture filename

name of texture to tile onto the image background

-tile-offset geometry

tiles offset

-treedepth value

color tree depth

-transparent-color color

transparent color

-undercolor color

annotation bounding box color

-units type

the units of image resolution


prints detailed information about the image


FlashPix viewing transforms

-virtual-pixel method

virtual pixel access method

-weight type

Renders text with this font weight

-white-point point

chromaticity white point

Image Operators:

-adaptive-blur geometry

adaptively blur pixels; decrease effect near edges

-adaptive-resize geometry

adaptively resizes image using ‘mesh’ interpolation

-alpha option

on, activate, off, deactivate, set, opaque, copy

-annotate geometry text

annotate the image with text


automagically adjusts gamma level of image


automagically adjusts color levels of image


automagically orients (rotates) image

-bench iterations

Measures performance

-black-threshold value

forces all pixels below the threshold into black

-blue-shift factor

Simulates a scene at nighttime in the moonlight

-blur geometry

Reduces image noise and reduce detail levels

-border geometry

Surrounds image with a border of color

-border geometry

Surrounds image with a border of color

-bordercolor color

border color

-brightness-contrast geometry

improves brightness / contrast of the image

-cdl filename

color correct with a color decision list

-charcoal radius

Simulates a charcoal drawing

-chop geometry

Removes pixels from the image interior


Restricts pixel range from 0 to the quantum depth


Clips along the first path from the 8BIM profile

-clip-mask filename

Associates a clip mask with the image

-clip-mask filename

Associates a clip mask with the image

-clip-path id

Clips along a named path from the 8BIM profile

-colorize value

Colorizes the image with the fill color

-color-matrix matrix

Applies color correction to the image


Enhances or reduce the image contrast

-contrast-stretch geometry

Improves contrast by `stretching’ the intensity range

-convolve coefficients

Applies a convolution kernel to the image

-cycle amount

Cycles the image colormap

-decipher filename

converts cipher pixels to plain pixels

-deskew threshold

straightens an image


Reduces the speckles within an image

-distort method args

distort images according to given method ad args

-draw string

Annotates the image with a graphic primitive

-edge radius

Applies a filter to detect edges in the image

-encipher filename

Converts plain pixels to cipher pixels

-emboss radius

Embosses an image


Performs histogram equalization to an image

-evaluate operator value

evaluates an arithmetic, relational, or logical expression

-extent geometry

Sets the image size

-extract geometry

Extracts area from image


implements the discrete Fourier transform (DFT)


Flips image vertically

-floodfill geometry color

Floodfills the image with color


Flops image horizontally

-frame geometry

Surrounds image with an ornamental border

-function name parameters

Applies function over image values

-gamma value

level of gamma correction

-gaussian-blur geometry

Reduces image noise and reduce detail levels

-geometry geometry

preferred size or location of the image


Identifies the format and characteristics of the image


implements the inverse discrete Fourier transform (DFT)

-implode amount

Implodes image pixels about the center

-lat geometry

local adaptive thresholding

-layers method

optimizes, merges, or compares image layers

-level value

Adjusts the level of image contrast

-level-colors color,color

Levels image with the given colors

-linear-stretch geometry

Improves contrast by `stretching with saturation’

-liquid-rescale geometry

Rescales image with seam-carving

-median geometry

Applies a median filter to the image

-mode geometry

Makes each pixel the ‘predominate color’ of the neighborhood

-modulate value

Varies the brightness, saturation, and hue


transforms image to black and white

-morphology method kernel

Applies a morphology method to the image

-motion-blur geometry

Simulates motion blur


Replaces every pixel with its complementary color

-noise geometry

adds or reduces noise in an image


Transforms image to span the full range of colors

-opaque color

Changes this color to the fill color

-ordered-dither NxN

Adds a noise pattern to the image with specific amplitudes

-paint radius

Simulates an oil painting

-polaroid angle

Simulates a Polaroid picture

-posterize levels

Reduces the image to a limited number of color levels

-profile filename

adds, deletes, or applies an image profile

-quantize colorspace

Reduces colors in this colorspace

-radial-blur angle

radial blurs the image

-raise value

Lightens/darkens image edges to create a 3-D effect

-random-threshold low,high

random thresholds the image

-region geometry

Applies options to a portion of the image


Renders vector graphics

-repage geometry

size and location of an image canvas

-resample geometry

Changes the resolution of an image

-resize geometry

Resizes the image

-roll geometry

Rolls an image vertically or horizontally

-rotate degrees

Applies Paeth rotation to the image

-sample geometry

Scales image with pixel sampling

-scale geometry

Scales the image

-segment values

Segments an image

-selective-blur geometry

selectively blurs pixels within a contrast threshold

-sepia-tone threshold

simulates a sepia-toned photo

-set property value

Sets an image property

-shade degrees

Shades the image using a distant light source

-shadow geometry

Simulates an image shadow

-sharpen geometry

Sharpens the image

-shave geometry

Shaves pixels from the image edges

-shear geometry

Slides one edge of the image along the X or Y axis

-sigmoidal-contrast geometry

Increases the contrast without saturating highlights or shadows

-sketch geometry

Simulates a pencil sketch

-solarize threshold

Negates all pixels above the threshold level

-sparse-color method args

fills in a image based on a few color points

-statistic type geometry

Replaces each pixel with corresponding statistic from the neighborhood


Strips image of all profiles and comments

-swirl degrees

Swirls image pixels about the center

-threshold value

Thresholds the image

-thumbnail geometry

Creates a thumbnail of the image

-tile filename

Tiles image when filling a graphic primitive

-tint value

Tints the image with the fill color


affine transforms image


Flips image vertically and rotate 90 degrees


Flops image horizontally and rotate 270 degrees


Trims image edges

-type type

image type


Discards all but one of any pixel color

-unsharp geometry

Sharpens the image

-vignette geometry

Softens the edges of the image in vignette style

-wave geometry

Alters an image along a sine wave

-white-threshold value

force all pixels above the threshold into white

Image Sequence Operators:


Appends an image sequence


Applies a color lookup table to the image


Merges a sequence of images


Combines a sequence of images


Composites image

-crop geometry

Cuts out a rectangular region of the image


Breaks down an image sequence into constituent parts

-evaluate-sequence operator

Evaluates an arithmetic, relational, or logical expression


Flattens a sequence of images

-fx expression

Applies mathematical expression to an image channel(s)


Applies a Hald color lookup table to the image

-morph value

Morphs an image sequence


Creates a mosaic from an image sequence

-print string

Interprets string and print to console

-process arguments

Processes the image with a custom image filter


Separates an image channel into a grayscale image

-smush geometry

Smashes an image sequence together

-write filename

Writes images to this file

Image Stack Operators:

-clone indexes

Clones an image

-delete indexes

Deletes the image from the image sequence

-duplicate count,indexes

Duplicates an image one or more times

-insert index

Inserts last image into the image sequence


Reverses image sequence

-swap indexes

Swaps two images in the image sequence

Miscellaneous Options:

-debug events

Displays copious debugging information


Prints program options

-list type

Prints a list of supported option arguments

-log format

Formats of debugging information


Prints version information

Here is a java code that converts image from jpeg to tiff.

publicstaticvoid main(String[] args) throws IOException,




String searchPath = “E:/image_magick”;

String sourceImage = “data/imade_art2.jpg”;

String destImage = “data/imade_art2.tiff”;

IMConvertCmd.tryExample(searchPath, sourceImage, destImage);



* Creates ConvertCmd, sets search path, sets command, runs convert command,

* creates IMOperation, adds to it an image, runs identify and verbose commands


* @param searchPath where ImageMagic exe’s placed

* @param sourceImage a source image

* @param destImage a destination image to be converted


* @throws IOException

* @throws InterruptedException

* @throws IM4JavaException


publicstaticvoid tryExample(String searchPath, String sourceImage,

String destImage) throws IOException,




ConvertCmd convertCmd = new ConvertCmd();


convertCmd.setCommand(sourceImage, destImage);

convertCmd.run(new IMOperation());

IMOperation op = new IMOperation();


IMOps ops = op.identify().verbose();



There is another cool thing called MSL. Stands for Magick Scripting Language basically XML language, intends for those who want to accomplish custom image processing tasks without programming. The interpreter is called conjure. The scripts looks as typical XML file with specialized tags in it and file extension msl.

An example of MSL:

    <?xml version="1.0" encoding="UTF-8"?>
    <image size="116x28" >
      <read filename="imade_art2.jpg" />
      <get width="base-width" height="base-height" />
      <resize geometry="%[dimensions]" />
      <get width="width" height="height" />
      <print output=
        "Image sized from %[base-width]x%[base-height]
         to %[width]x%[height].\n" />
      <write filename="imade_art2.png" />

To invoke this script:

conjure -dimensions 116x28 firstMSL.msl

Magick Scripting Language (MSL) defines the following elements and their attributes:


Attribute description/option(s)


Define a new image object. </image> – Destroys it.


Defines a new group of image objects. By default, images are only valid for the life of their <image> element. However, in a group, all images in that group will stay around for the life of the group.


Reads a new image from the disk.


Writes the image(s) to disk, either as single or multiple ones if necessary.


Gets any recognized attribute and stores it as an image attribute for later use. Currently only width and height are supported.


Sets background, bordercolor, clip-mask, colorspace, density, magick, mattecolor and opacity.


Surrounds the image with a border color. Options: fill, geometry, height, width


Reduces image noise and reduces detail levels. Options: radius, sigma


Simulate a charcoal drawing. Options: radius, sigma


Removes pixels from the interior of an image. Options: geometry, height, width, x, y


Cuts out one or more rectangular regions of the image. Options: geometry, height, width, x, y


Remove “pepper” from an image


Replaces each pixel of an image by a highlight or a shadow, depending on light/dark boundaries on the original image.


Removes blurring and noise, increases contrast and reveals details.


Applies a histogram equalization to the image


Creates a mirror image, reflecting the scanlines in the vertical direction.


Creates a mirror image, reflecting the scanlines in the horizontal direction.


Surrounds the image with a border or beveled frame. Options: fill, geometry, height, width, x, y, inner, outer


Options: height, width


Scales the image to twice its size


Scales the image to half its size


Enhances the contrast of a color image


Reads the input image


Resizes an image. Options: blur, filter, geometry, height, width


Rolls an image vertically or horizontally. Options: geometry, x, y


Applies Paeth image rotation. Options: degrees


Changes the image size simply by directly sampling the pixels of original image. Options: geometry, height, width


Changes the image size by replacing pixels by averaging pixels together when minifying or replacing pixels when magnifying. Options: geometry, height, width


Uses a Gaussian operator of the given radius and standard deviation (sigma). Options: radius, sigma


Removes pixels from the image edges. Options: geometry, height, width


Negates all pixels above the threshold level. Options: threshold


Displaces image pixels by a random amount. Options: radius


Hides watermark within an image. Options: image


Generates stereogram of two images (one for each eye). Options: image


Swirls image pixels about the center. Options: degrees


Tiles texture onto the image background. Options: image


Applies simultaneous black/white threshold to the image. Options: threshold


Makes [this] color transparent within the image. Options: color


Removes any edges that are exactly the same color as the corner pixels.

In this short tutorial I could not include all ImageMagick utilities, so you welcome check them out by yourself.


  1. During using tesseract I’ve been making wrong decisions. One of them was using Cygwin. I wasted about two days trying to compile the sources, adding more and more missed libraries, recompiling again. Finally, I got my executables. However some of the features still were not working. Having decided remove all cygwin “mess”, and installed Microsoft Visual studio 2008 Express solved my troubles. It took only about 2 hours including installation of MS VS and compiling entire solution. It worked as a charm!

  2. The next challenge was to install the ImageMagick. I thought that having MS VS 2008 installed I wouldn’t have problems compiling the sources. I was wrong again. The ImageMagick has dependencies on MSF library and cannot be compiled using MS VS 2008 Express, i.e. this library is out. The other option was to install ImageMagick binary distro. And it worked. Or you can install Visual Studio 6. It’s up to you.

  3. Before using tesseract I encourage you to read its FAQ and Wiki. If you have some question(s) subscribe to the tesseract mailing-list. There are excellent people that can help you. Unlike opening tickets requesting support and waiting days or even months, here help comes very quick.


Usually OCR contains two stages. In the first stage we prepare our data to be processed. Some images have a noise, others poorly scanned or their format do not fit to our purposes. ImageMagick helps us to perform such kind of preparation aiming create scripts that automate the process. In the second stage, we actually do an OCR. Tesseract has a baseapi that make easier to integrate its capabilities with an environment.

Building your system, keep in mind OCR’s limitations.

If you have comments/suggestions please share it with me and other people.

Have a fun!

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: