Archive

Author Archive

How Spring Integration can alleviate your life.

June 23, 2013 Leave a comment

Some time ago I began a new job in big corporation. My first task was re-implement / re-import their C# tcp client to Java’s. Existed convertors have been sucking, so I did it manually. After week or so, freshy Java tcp client & server simulator have written & waited for further use. Having met with client’s requirements we found that Java’s implementation has a lack of important features such as: fail-over & auto-reconnection. Adding such functionality required from us add some untested code and might be insufficient flows in the business logic. One of our guys said, Aha, what if …? We can replace Java’s implementation to another one, for instance – Spring Integration. The rest of us smiled thinking what the heck? Anyway, my was is a good champ trying to take best technologies ever existed. We got a green light to do research & learn something exciting. To simplify our requirements I am going to show a simulator (aka server) & a client.

   Before delving deeper, let me explain what Spring Integration intended for. As their site suggests: “it provides an extension of the Spring programming model to support the well-known Enterprise Integration Patterns”. Rephrasing, to design good enterprise application one could use a messaging more precisely asynchronous messaging that enables diverse applications to be integrated each other without nightmare or pain. One of wise guys named Martin Fowler has written famous book “Enterprise Integration Patterns”. Folk from Spring probably one day decided to materialize a theory in practice. Very pragmatic approach, isn’t it? Later you will see how wonderful fits for regular tasks. The main concept of SI is: Endpoint, Channel & Message.

Endpoint is a component which actually does something with a message. A message is a container consisting of header & payload. The header contains data that’s relevant to the messaging system where the payload contains the actual data. Channel connects two or more endpoints, it’s similar to Unix’ pipes. Two endpoints can exchange messages iff they’re connected through a channel. Pretty easy, isn’t it? The following diagram shows this.

Image

    The next step to our crash course will be defining requirements. I would say, we need a server (a tcp) & tcp client. We will write a simple application that will exchange a couple of messages with each other.

    Important thing using SI is a configuration file which contains all necessary components that we going to use. Here is a “server” part of the configuration. Simplifying a model & SI lifecycle, Spring creates objects that defined in configuration xml. More generally such a concept called declarative programming. You define a business object in the xml, and a framework will generate appropriate classes for you, injects and initializes dependencies. The mantra says: you should be concentrated only on business and not on implementation.

    Let’s define a part of the configuration xml, the server part.

http://pastebin.com/6AHQWPse


<int-ip:tcp-connection-factory id="tcpServerFactory"
type="server"
port="23234"
single-use="false"
serializer="byteArrayLenSerializer"
deserializer="byteArrayLenSerializer" />
<int-ip:tcp-inbound-channel-adapter channel="serverIn"
connection-factory="tcpServerFactory"/>

<int-ip:tcp-outbound-channel-adapter channel=”serverOut”
connection-factory=”tcpServerFactory”/>

Important things are: i. A factory (tcp-connection-factory) – creates tcp server using byte array length serializer. A serializer is needed for “packaging” our message by some way or encode it in order to transmit it over a wire. On the other hand deserializer is needed for “unpackaging” our message or decode it. Spring Integration has two factories one for client & another – for the server. The difference is – by type [server or client]. A port – means to listen to for incoming messages. IP address not mentioned here because a server runs as a localhost.

   We also defined two channels: serverIn (for incoming messages) & serverOut (for outgoing messages). In order our server will send & receive messages we define inbound & outbound adapter which are associated with factory & channels. In our case it defines the endpoints. So, when message comes somewhat should take care of it. This responsibility takes a service, i.e. file sender service. If it accepts a message afterwards will send in background a file, line by line to the client. Basically, when a server starts, it listens for incoming messages however only specific message will be accepted and if that message is gotten, than server sends line by line a file. If an error occurs it’s routed to the error channel. It’s done using interceptor.

   I would say a couple of words about SI lifecycle.  Spring framework has two “main” packages: org.springframework.beans & org.springframework.context that builds up the core utility of the dependency injection of the component. The org.springframework.beans.factory.BeanFactory interface provide a basic lifecycle methods (start & stop) for bean initialization/destruction. The org.springframework.context.ApplicationContext offers AOP integration, message resource handling and even more.

Our server is ready, I mean, completely ready. To run the example follow the below steps:

  • cd /tcpserver
  • mvn clean install
  • mvn dependency:copy-dependencies
  • mvn exec:java -Dexec.mainClass=”org.example.tcpserver.ServerRunner” -Dexec.args=”–file=”/file_to_be_sent.txt””

Our main class expresses as follows:


CommandLinePropertySource clps = processProperties(args);
/* Spring Integration context used to get desirable beans. */
AbstractApplicationContext context = new ClassPathXmlApplicationContext(new String[] {"server-config.xml"}, false);
context.getEnvironment().getPropertySources().addFirst(clps);
context.refresh();
context.registerShutdownHook();

 The source code can be found here http://pastebin.com/6PMpWTfX.

Also we define a file send service:


String key = new String(appropriateData, "UTF-8");
LOG.info("got.message" + " [" + key + "]");
/* If message accepted */
if (key.contains(SEARCH_KEY)) {
LogReader lr = new LogReader(sender, msg);
lr.setPath2File(getFile().getAbsolutePath());
es.execute(lr);
}

http://pastebin.com/icHRdQS3
Next, denote a business runner:


/* Creates an input stream to be read. */
fstream = new FileInputStream(getPath2File());
/* Wraps an input stream in order to be able reading of a whole line */
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
while ((line = br.readLine()) != null) {
command = line;
sendAndLog(timeToWait);
}

http://pastebin.com/LZRdZ3Tg
Finally, for the server write an error handler which logs the errors:


public void handleRequestMessage(byte[] payload) {
LOG.debug("Server got an error " + new String(payload));
}

http://pastebin.com/2EQvbVR8

Until now we’ve done with our server :-).

Now, let’s define a tcp client which will connect to the server, sends an accept message & gets a file sent from the server.

Our configuration file looks as follows:

http://pastebin.com/egquzq5q


<!-- Wraps a service with two reply-request channels. -->
<int:gateway     id="client"
service-interface="org.example.tcpclient.TcpClientService"
default-reply-channel="replyChannel"
default-request-channel="requestChannel"
default-reply-timeout="1000"
default-request-timeout="1000">
</int:gateway>
<!-- Request channel -->
<int:channel id="requestChannel">
<int:queue capacity="10" />
</int:channel>
<!-- Direct channel used for reply. -->
<int:channel id="replyChannel" />

Here how to run a client:

  • Open a new terminal
  • cd /tcpclient
  • mvn clean install
  • mvn dependency:copy-dependencies
  • mvn exec:java -Dexec.mainClass=”org.example.tcpclient.ClientTcp”

Almost the same logic expresses here. Have a look.

A main class has the following lines:

/* Spring Integration context used to get desirable beans. */
AbstractApplicationContext context = new ClassPathXmlApplicationContext(
new String[] { "client-config.xml" }, false);
context.refresh();
context.registerShutdownHook();
TcpClientService service = context.getBean("client", TcpClientService.class);
service.send("GIMMY");

http://pastebin.com/9mjmRyNk
In addition, define a client service:


void send(String txt);

Next, a message handler:


public void handle(byte[] s) {
String ss = new String(s);
LOG.info("r:" + ss);
}

http://pastebin.com/Wg4mscvk
And the last one is an interceptor, which will inform your application about:

i. Message sent;

ii. A connection closed;

iii. A new connection added.


public void send(Message<?> message) throws Exception {
super.send(message);
LOG.debug("Sent message [" + new String((byte[]) message.getPayload()) + "]");
}
public void close() {
super.close();
LOG.debug("Closed connection");
}

public void addNewConnection(TcpConnection connection) {
super.addNewConnection(connection);
LOG.debug("Added new connection" + connection.getHostName() + ":" +
connection.getPort());
}

http://pastebin.com/wiDm5zbH

That’s it !!! 🙂

To play with the code, have a look at here http://www.4shared.com/zip/eF4q7l0k/spring_integration_example.html.

Prerequisites:

  1. Java 1.6 or above;

  2. Maven 3 or above;

  3. Desire to learn something new & thrilling;

Pros:

  • A lot of features

  • Tested

  • Good & friendly community

  • If you have questions, the people really quickly reply

  • There are tons of examples

  • API is easy & comprehensive

Cos:

  • Takes time to learn & understand how to work with it.

  • If you got troubles, sometime it is difficult to debug it.

Peace be upon you.

Tika chm extractor – LGPL alternative

Tika chm extractor

I’m pleased to announce that tika chm extractor LGPL licensed is released yesterday. Honestly, it’s not pure LGPL, only libraries it depends on, the rest of the code – Apache license version 2.0.

All relevant information can be found here.
Download the sources go to the github.

Why should it live?
Well, the “original” Tika’s extraction algorithm works pretty well in most of the cases, however, has “difficulties” in rare cases. Inventors of compressed html files by unknown reason couldn’t publish their specification thus the algorithm for extracting context from Tika chm parser is not perfect, but quite good.
Possible solution that crossed everybody’s mind, to use native libraries. Fare enough though. The only one question is in, how to make it working on multiple platforms. Aha! Having checked available options I figured out stable Java library called sevenzipjbind.

The extractor designed as stand alone program. I.e. is a server based on Jetty which listens to HTTP requests. Currently has three options: i. Extracts single file including metadata; ii. Extracts context &amp; metadata from all files in the provided directory; iii. Extracts only metadata from single chm.
In addition, it saves extracted context &amp; its metadata in special folder following the pattern : ../extracted_files/folder_name_as_file_name/extracted html files. Metadata goes under ../extracted_files/file_name.json

Examples how to use it you also can be found on github.

Please don’t hesitate to ask either by replying to this post, contacting me, or by sending a Twitter!

Categories: announcement Tags: , , ,

OCR using Tesseract and ImageMagick as pre-processing task

December 19, 2012 Leave a comment

While many applications today use direct data entry via keyboard, more and more of these will return to automated data entry. The reasons for this include the increased incidence of operator wrist problems from constant keying and the potential hazards of video display terminal emissions. Therefore any application imaginable is a candidate for OCR.

What are its Applications?

  • Automatic number plate recognition, is used by various police forces and as a method of electronic toll collection on pay-per-use roads, parking, car washing stations etc and cataloging the movements of traffic or individuals (quite popular in Central London).
  • Book scanning – digital books can be easily distributed, reproduced and read on-screen. Projects like Project Gutenberg, Google Book Search scan books on a large scale.
  • CAPTCHA – is a type pf challenge-response test used in computing as an attempt ensure that the response is not generated by a computer. Stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
  • Computational linguistics – machine translation
  • Digital pen as well as digital paper
  • Digital mail room is an automation of incoming mail processes for classification and distribution of mail.
  • Handwriting – is a person’s particular and individual style of writing with pen or pencil. Every literate human has his own manner of writing. Graphology is the controversial study and analysis of handwriting especially in relation to human psychology. Sometimes it’s a part of hiring processing, from the candidate asked to write by hand about its familiar topic and after that send to the authorities for the psycho-analysis of the person.
  • Music OCR – intended to interpret sheet music or printed scores into editable and playable form.
  • Optical Mark Recognition – is a process of capturing human marked data from document forms such as surveys and tests.
  • Kurtzwiel – text-to-speech converter software program, which enables a computer to read electronic and scanned text aloud to visually-impaired people.

Principles of OCR Technology

Optical Character Recognition (OCR) systems may recognize machine print. Using pattern-matching technology, OCR translates the shapes and patterns of machine-made characters into corresponding computer codes. Though most advanced systems are able to recognize multiple fonts, they can process only standard fonts such as Times Roman and Arial. Once all characters in a given word are recognized, the word is compared against a vocabulary of potential answers for the final result.

Character recognition then segments lines of text or words into separate characters that are recognized by the makeup of their component shapes. Machine-printed letters are evenly spaced across, and up-and-down, a given page, allowing the OCR system to read the text one character at a time. Segmentation into single characters represents a critical recognition failure point for forms processing organizations, because OCR recognition technology requires high-quality images with excellent contrast, character and clarity. Any text that is less than perfect will cause even the most sophisticated OCR systems to return significant reductions in accuracy when processing degraded images.

How to choose an optimal product?

When discussing what an OCR product to choose, the number of criteria should be considered. What a price you’re ready to pay? What’s a quality of the product? How is it supported? And so on, and so on. Fortunately for us, such a product exists. It’s open source, very good quality, pretty well supported and still alive. It called tesseract-ocr. Why tesseract? Because it’s open source, it’s licensed ASFv2, because it’s one of the best, the support is pretty well via mailing-list, runs on multiple platforms, has wide range of build-in languages, stable and easily integrates with other systems.

This tutorial divided by:

Introduction to tesseract-ocr

Installation of tesseract 3.0.1 for Windows.

Extracting the text

Writing simple tesseract function using baseapi

Writing Java function that extracts text from given image using ProcessBuilder and tesseract.exe

Introduction to ImageMagic

Installation ImageMagic 6.6.9-8 for Windows

Checking the installation

Brief description what’s under the hood, useful command line utilities.

Java API to ImageMagic (http://im4java.sourceforge.net/)

Introduction to MSL

Writing simple MSL script

Tips

Conclusion

bibliography

Introduction to teseract-ocr

As WIKI suggests, in geometry, the tesseract, also called an 8-cell or regular octachoron or cubic prism, is the four-dimensional analog of the cube. The tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of 6 square faces, the hypersurface of the tesseract consists of 8 cubical cells. The tesseract is one of the six convex regular 4-polytopes.

In our case, the Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. Now Google takes care of it.

Tesseract Installation

During this tutorial we will use Windows box with Microsoft Visual Studio 2008 Express installed.

The installation is very simple, takes about 1 hour. You can use Ant script provided for running particular tasks or do it by yourself.

Let’s meet the tutorial requirements.

  1. Install Microsoft Visual Studio 2008 Express (http://msdn.microsoft.com/en-us/express/future/bb421473)
  2. Add vcbuild.exe to the classpath
  3. Install Ant (http://ant.apache.org/bindownload.cgi)
  4. Install SVN client (http://subversion.apache.org/packages.html)
  5. Check the Java2SE 1.5/6 installation

Now, we are ready to step in the word of image processing.

Step 1.

Download the tessearct source files and data. You have two options to do it, 1. using svn or 2. using ant script provided.

If you chosen use an Ant, check the following properties first.

  • tesseract.dir – a path to the tesseract sources to be download
  • tesseract.dir.name – a folder name, i.e. ${tesseract.dir}/${tesseract.dir.name}

Just make sure it exists, or make it yourself. mkdir ….

Type:

ant svn

Ok, time to go drink a coffee or read the news.

Well, continuing using Ant script, type:

ant build

If all went good, you will be notified that all 60 projects successfully built.

Tesseract chipped with the following list of trained languages:

  • Arabic
  • Bulgarian
  • Catalan
  • Czech
  • Chinese simplified
  • Chinese traditional
  • Danish
  • German
  • Greek
  • English
  • Finnish
  • French
  • Hebrew
  • Hindi
  • Croatian
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Lithuanian
  • Dutch
  • Norwegian
  • And more

Let’s see what we have inside.

  1. tesseract – extracts text or characters from the image.Usage: tesseract imagename outputfile -l -psm configfile-l, -psm, configfile are optional. -l means language in ISO 639-3 standard (eng, rus, ell etc). -psm means pagesegmode, the following mode are available:
    psm mode Description
    0 Orientation and script detection (OSD) only
    1 Automatic page segmentation with OSD
    2 Automatic page segmentation, but no OSD, or OCR
    3 Fully automatic page segmentation, but no OSD. (Default)
    4 Assume a single column of text of variable sizes
    5 Assume a single uniform block of vertically aligned text
    6 Assume a single uniform block of text
    7 Treat the image as a single text line
    8 Treat the image as a single word
    9 Treat the image as a single word in a circle
    10 Treat the image as a single character
  2. cntraining – generates a normproto and pffmtable. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). It then appends these samples into a separate file for each character. The name of file is: DirectoryName/FontName/CharName.FeatureTypeName. The DirectoryName can be specified via a command line argument. If not specified, it defaults to the current directory.
  3. combine_tessdata – creates an unified traineddata file from different files produced by the training process.
    Usage Description
    language_data_path_prefix (e.g. tessdata/eng.) Combines all individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs). The result will be a combined tessdata file lang_code.traineddata
    -e Extracts individual components from a combined trained data file. For instance, combine_tessdata -e tessdata/ell.traineddata
    -o Overwrites individual components of the given lang_code.traineddata file. Example:

combine_tessdata -o tessdata/ell.traineddata-uUnpacks all the components to the specified path. For instance,

combine_tessdata -u tessdata/ell.traineddata /home/$USER/temp/ell

  • mftraining – Separates training pages into files for each character. Strips from files only the features and there parameters of the feature type mf. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). The result is a binary file used by the OCR engine.
  • unicharset_extractor – Extracts a character/ligature set. Given a list of box files on the command line, generates a file containing an unicharset, a list of all the characters. The file contains the size of the set on the first line, and then one unichar per line.Usage: unicharset_extractor [-D DIRECTORY] FILE…
  • wordlist2dawg – Generates a DAWG from word list file. Given a file that contains a list of words (one word per line) and generates the corresponding squished DAWG file.Usage: wordlist2dawg [-t | -l min_len max_len] word_list_file dawg_file unicharset_file

Often, people think that with OCR they can “crack” gotchas.

captcha

As example, run the following:

tesseract.exe ..\kor_data\gotcha.tif gotchaOutput.txt -l eng

For human being it’s easy to recognize what’s written (rondity describe.), however, look at output:

rmdwdescrbe.

It could not recognize the first word, white space. Only second word recognized perfectly. You can train you OCR be able take care of words like a first one, but that who produces such gotchas will change their algorithm and you fail again. In this case, don’t try harder.

Another example:

train_data

tesseract.exe ..\kor_data\fra.arial.g4.tif ..\kor_data\fra_output.txt -l fra

Observing the output you probably found that extracted text is quite good but not perfect. Some characters misunderstood. To fix that you need “add” these characters to the traineddata. This process well described in tesseract-ocr wiki (http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3).

In addition to batch processing tesseract-ocr makes possible integrate its capabilities with your program/product through basic c++ API. It’s well documented and easy to use. All basicapi sources located in ../api folder.

Here is an example:

#include "baseapi.h"

char* run_tesseract(const char* datapath, const char* language,
                    const unsigned char* imagedata,
                    int bytes_per_pixel, int bytes_per_line,
                    int left, int top, int width, int height) {

//Starts tesseract. Datapath must be the name of parent dir and must end in '/'.
TessBaseAPI::Init(datapath, language);

//Recognizes a rectangle from an image and returns the result as a string
char* text =
TessBaseAPI::TesseractRect(imagedata, bytes_per_pixel, bytes_per_line,
                           left, top, width, height);

//Closes down tesseract and frees up all memory
TessBaseAPI::End();

  return text;
}

Java code using ProcessBuilder looks like:

/**

* Returns a text extracted from image

*

* @param image File, might be tiff, png or gpeg

* @param tesseractPath where a tesseract executable is located

* @param iso639_3Lang three long character String, for instance, fra

* @return extracted text

*

* @throws IOException

* @throws InterruptedException

*/

publicstatic String getExtractedText(File image, String tesseractPath,

String iso639_3Lang)

throws IOException, InterruptedException

{

File outputFile = new File(image.getParentFile(), “output”);

StringBuffer buffer = new StringBuffer();

ProcessBuilder pb = new ProcessBuilder(tesseractPath + File.separator +

“tesseract”, image.getCanonicalPath(),

outputFile.getAbsolutePath(),

“-l”, iso639_3Lang);

pb.redirectErrorStream(true);

Process process = pb.start();

process.waitFor();

BufferedReader in = new BufferedReader(new InputStreamReader(

new FileInputStream(outputFile.getAbsolutePath() +

“.txt”), “UTF-8”));

String str;

while ((str = in.readLine()) != null) {

buffer.append(str).append(System.getProperty(“line.separator”));

}

in.close();

new File(outputFile.getAbsolutePath() + “.txt”).delete();

return buffer.toString();

}

Working with OCR, often you will want to prepare your data (images) before throwing to the OCR. It could be converting image format, increase/decrease an image resolution, reduce image noise. There are a lot of options to achieve that, GIMP (http://www.gimp.org/) – is free, mutual and cute! If you trying to automate the data preparation process look at ImageMagic (http://www.imagemagick.technocozy.com/).

ImageMagic

It can do: detect edges, add noise, capture a screen and more and more. I could not cover them here, however I’m going to cover a relevant part to the our ocr processing.

  • Format conversion;
  • Transformations;
  • Composite – not sure …
  • Image identification
  • MSL – Magic Scripting Language – not sure

Installation of the ImageMagic

IM supports wide range of platforms, from *Nix to the Windows. I suppose, throughout this tutorial you used Windows and let it be so.

If you use the Ant script provided, run:

ant im.http

This command will download the windows installer. The Ant properties are in build.properties files, change them according to your set-up.

Moreover, the MAGICK_HOMEenvironment variable should be set to the path where you previously extracted the ImageMagick files.

Verifying installation

convert logo: logo.miff
imdisplay logo.miff

ImageMagick core utilities

Utility name

Usage

Display

Intended to view an image, manage its functionality including load, print, write to file, zoom, copy a region, paste a region, crop, show histogram and even more.

Convert

Converts image formats. Can be used for making thumbnails, charcoal drawning, oil painting, morphing

Import

Used to capture the screen and writes it to the file. Can be specified a single window, the entire screen, or any portion of the screen

Animate

Shows animated formats or a sequence of images. Has a capability for color reduction to match the color resolution of the display.

Composite

Combines several separate images with the following schemes: Over, In, Out, Atop, Xor, Plus, Minus, Difference, Multiply and Bumpmap.

Montage

Arranges a group of images into a single image.

Mogrity

Applies transformations on images and unlike other utilities overwrites the result on the original image.

Conjure

Magick Scripting Language (MSL), XML-based language using Conjure to perform any image processing activity without Perl interpreter.

Identity

Detects more information about an image format, such as file size, width, height, mapped color, number of colors and can detect if an image is corrupted.

ImageMagick has unbelievable number of interfaces, you can choose whatever you want. In this tutorial we will use Java API – im4java (http://im4java.sourceforge.net/).

Convert usage, options and image operators

Usage: convert.exe [options …] file [ [options …] file …] [options …] file

Options – Image Settings:

adjoin

joins images into a single multi-image file

-affine matrix

affine transform matrix

-alpha option

activates, deactivates, resets, or sets the alpha channel

-antialias

removes pixel-aliasing

-authenticate password

deciphers image with this password

-attenuate value

lessens (or intensify) when adding noise to an image

-background color

background color

-bias value

adds bias when convolving an image

-black-point-compensation

uses black point compensation

-blue-primary point

chromaticity blue primary point

-bordercolor color

border color

-caption string

assigns a caption to an image

-channel type

applies option to select image channels

-colors value

preferred number of colors in the image

-colorspace type

alternates image colorspace

-comment string

annotates image with comment

-compose operator

sets image composite operator

-compress type

type of pixel compression when writing the image

-define format:option

defines one or more image format options

-delay value

displays the next image after pausing

-density geometry

horizontal and vertical density of the image

-depth value

image depth

-direction type

renders text right-to-left or left-to-right

-display server

gets image or font from this X server

-dispose method

layers disposal method

-dither method

applies error diffusion to image

-encoding type

text encoding type

-endian type

endianness (MSB or LSB) of the image

-family name

renders text with this font family

-fill color

color to use when filling a graphic primitive

-filter type

uses this filter when resizing an image

-font name

renders text with this font

-format “string”

output formatted image characteristics

-fuzz distance

colors within this distance are considered equal

-gravity type

horizontal and vertical text placement

-green-primary point

chromaticity green primary point

-intent type

type of rendering intent when managing the image color

-interlace type

type of image interlacing scheme

-interline-spacing value

sets the space between two text lines

-interpolate method

pixel color interpolation method

-interword-spacing value

sets the space between two words

-kerning value

sets the space between two letters

-label string

assigns a label to an image

-limit type value

pixel cache resource limit

-loop iterations

adds Netscape loop extension to your GIF animation

-mask filename

associates a mask with the image

-mattecolor color

frame color

-monitor

Monitors progress

-orient type

image orientation

-page geometry

size and location of an image canvas (setting)

-ping

efficiently determines image attributes

-pointsize value

font point size

-precision value

maximum number of significant digits to print

-preview type

image preview type

-quality value

JPEG/MIFF/PNG compression level

-quiet

suppresses all warning messages

-red-primary point

chromaticity red primary point

-regard-warnings

Pays attention to warning messages

-remap filename

Transforms image colors to match this set of colors

-respect-parentheses

settings remain in effect until parenthesis boundary

-sampling-factor geometry

horizontal and vertical sampling factor

-scene value

image scene number

-seed value

Seeds a new sequence of pseudo-random numbers

-size geometry

width and height of image

-stretch type

renders text with this font stretch

-stroke color

graphic primitive stroke color

-strokewidth value

graphic primitive stroke width

-style type

Renders text with this font style

-synchronize

synchronize image to storage device

-taint

Declares the image as modified

-texture filename

name of texture to tile onto the image background

-tile-offset geometry

tiles offset

-treedepth value

color tree depth

-transparent-color color

transparent color

-undercolor color

annotation bounding box color

-units type

the units of image resolution

-verbose

prints detailed information about the image

-view

FlashPix viewing transforms

-virtual-pixel method

virtual pixel access method

-weight type

Renders text with this font weight

-white-point point

chromaticity white point

Image Operators:

-adaptive-blur geometry

adaptively blur pixels; decrease effect near edges

-adaptive-resize geometry

adaptively resizes image using ‘mesh’ interpolation

-alpha option

on, activate, off, deactivate, set, opaque, copy

-annotate geometry text

annotate the image with text

-auto-gamma

automagically adjusts gamma level of image

-auto-level

automagically adjusts color levels of image

-auto-orient

automagically orients (rotates) image

-bench iterations

Measures performance

-black-threshold value

forces all pixels below the threshold into black

-blue-shift factor

Simulates a scene at nighttime in the moonlight

-blur geometry

Reduces image noise and reduce detail levels

-border geometry

Surrounds image with a border of color

-border geometry

Surrounds image with a border of color

-bordercolor color

border color

-brightness-contrast geometry

improves brightness / contrast of the image

-cdl filename

color correct with a color decision list

-charcoal radius

Simulates a charcoal drawing

-chop geometry

Removes pixels from the image interior

-clamp

Restricts pixel range from 0 to the quantum depth

-clip

Clips along the first path from the 8BIM profile

-clip-mask filename

Associates a clip mask with the image

-clip-mask filename

Associates a clip mask with the image

-clip-path id

Clips along a named path from the 8BIM profile

-colorize value

Colorizes the image with the fill color

-color-matrix matrix

Applies color correction to the image

-contrast

Enhances or reduce the image contrast

-contrast-stretch geometry

Improves contrast by `stretching’ the intensity range

-convolve coefficients

Applies a convolution kernel to the image

-cycle amount

Cycles the image colormap

-decipher filename

converts cipher pixels to plain pixels

-deskew threshold

straightens an image

-despeckle

Reduces the speckles within an image

-distort method args

distort images according to given method ad args

-draw string

Annotates the image with a graphic primitive

-edge radius

Applies a filter to detect edges in the image

-encipher filename

Converts plain pixels to cipher pixels

-emboss radius

Embosses an image

-equalize

Performs histogram equalization to an image

-evaluate operator value

evaluates an arithmetic, relational, or logical expression

-extent geometry

Sets the image size

-extract geometry

Extracts area from image

-fft

implements the discrete Fourier transform (DFT)

-flip

Flips image vertically

-floodfill geometry color

Floodfills the image with color

-flop

Flops image horizontally

-frame geometry

Surrounds image with an ornamental border

-function name parameters

Applies function over image values

-gamma value

level of gamma correction

-gaussian-blur geometry

Reduces image noise and reduce detail levels

-geometry geometry

preferred size or location of the image

-identify

Identifies the format and characteristics of the image

-ift

implements the inverse discrete Fourier transform (DFT)

-implode amount

Implodes image pixels about the center

-lat geometry

local adaptive thresholding

-layers method

optimizes, merges, or compares image layers

-level value

Adjusts the level of image contrast

-level-colors color,color

Levels image with the given colors

-linear-stretch geometry

Improves contrast by `stretching with saturation’

-liquid-rescale geometry

Rescales image with seam-carving

-median geometry

Applies a median filter to the image

-mode geometry

Makes each pixel the ‘predominate color’ of the neighborhood

-modulate value

Varies the brightness, saturation, and hue

-monochrome

transforms image to black and white

-morphology method kernel

Applies a morphology method to the image

-motion-blur geometry

Simulates motion blur

-negate

Replaces every pixel with its complementary color

-noise geometry

adds or reduces noise in an image

-normalize

Transforms image to span the full range of colors

-opaque color

Changes this color to the fill color

-ordered-dither NxN

Adds a noise pattern to the image with specific amplitudes

-paint radius

Simulates an oil painting

-polaroid angle

Simulates a Polaroid picture

-posterize levels

Reduces the image to a limited number of color levels

-profile filename

adds, deletes, or applies an image profile

-quantize colorspace

Reduces colors in this colorspace

-radial-blur angle

radial blurs the image

-raise value

Lightens/darkens image edges to create a 3-D effect

-random-threshold low,high

random thresholds the image

-region geometry

Applies options to a portion of the image

-render

Renders vector graphics

-repage geometry

size and location of an image canvas

-resample geometry

Changes the resolution of an image

-resize geometry

Resizes the image

-roll geometry

Rolls an image vertically or horizontally

-rotate degrees

Applies Paeth rotation to the image

-sample geometry

Scales image with pixel sampling

-scale geometry

Scales the image

-segment values

Segments an image

-selective-blur geometry

selectively blurs pixels within a contrast threshold

-sepia-tone threshold

simulates a sepia-toned photo

-set property value

Sets an image property

-shade degrees

Shades the image using a distant light source

-shadow geometry

Simulates an image shadow

-sharpen geometry

Sharpens the image

-shave geometry

Shaves pixels from the image edges

-shear geometry

Slides one edge of the image along the X or Y axis

-sigmoidal-contrast geometry

Increases the contrast without saturating highlights or shadows

-sketch geometry

Simulates a pencil sketch

-solarize threshold

Negates all pixels above the threshold level

-sparse-color method args

fills in a image based on a few color points

-statistic type geometry

Replaces each pixel with corresponding statistic from the neighborhood

-strip

Strips image of all profiles and comments

-swirl degrees

Swirls image pixels about the center

-threshold value

Thresholds the image

-thumbnail geometry

Creates a thumbnail of the image

-tile filename

Tiles image when filling a graphic primitive

-tint value

Tints the image with the fill color

-transform

affine transforms image

-transpose

Flips image vertically and rotate 90 degrees

-transverse

Flops image horizontally and rotate 270 degrees

-trim

Trims image edges

-type type

image type

-unique-colors

Discards all but one of any pixel color

-unsharp geometry

Sharpens the image

-vignette geometry

Softens the edges of the image in vignette style

-wave geometry

Alters an image along a sine wave

-white-threshold value

force all pixels above the threshold into white

Image Sequence Operators:

-append

Appends an image sequence

-clut

Applies a color lookup table to the image

-coalesce

Merges a sequence of images

-combine

Combines a sequence of images

-composite

Composites image

-crop geometry

Cuts out a rectangular region of the image

-deconstruct

Breaks down an image sequence into constituent parts

-evaluate-sequence operator

Evaluates an arithmetic, relational, or logical expression

-flatten

Flattens a sequence of images

-fx expression

Applies mathematical expression to an image channel(s)

-hald-clut

Applies a Hald color lookup table to the image

-morph value

Morphs an image sequence

-mosaic

Creates a mosaic from an image sequence

-print string

Interprets string and print to console

-process arguments

Processes the image with a custom image filter

-separate

Separates an image channel into a grayscale image

-smush geometry

Smashes an image sequence together

-write filename

Writes images to this file

Image Stack Operators:

-clone indexes

Clones an image

-delete indexes

Deletes the image from the image sequence

-duplicate count,indexes

Duplicates an image one or more times

-insert index

Inserts last image into the image sequence

-reverse

Reverses image sequence

-swap indexes

Swaps two images in the image sequence

Miscellaneous Options:

-debug events

Displays copious debugging information

-help

Prints program options

-list type

Prints a list of supported option arguments

-log format

Formats of debugging information

-version

Prints version information

Here is a java code that converts image from jpeg to tiff.

publicstaticvoid main(String[] args) throws IOException,

InterruptedException,

IM4JavaException

{

String searchPath = “E:/image_magick”;

String sourceImage = “data/imade_art2.jpg”;

String destImage = “data/imade_art2.tiff”;

IMConvertCmd.tryExample(searchPath, sourceImage, destImage);

}

/**

* Creates ConvertCmd, sets search path, sets command, runs convert command,

* creates IMOperation, adds to it an image, runs identify and verbose commands

*

* @param searchPath where ImageMagic exe’s placed

* @param sourceImage a source image

* @param destImage a destination image to be converted

*

* @throws IOException

* @throws InterruptedException

* @throws IM4JavaException

*/

publicstaticvoid tryExample(String searchPath, String sourceImage,

String destImage) throws IOException,

InterruptedException,

IM4JavaException

{

ConvertCmd convertCmd = new ConvertCmd();

convertCmd.setSearchPath(searchPath);

convertCmd.setCommand(sourceImage, destImage);

convertCmd.run(new IMOperation());

IMOperation op = new IMOperation();

op.addImage(destImage);

IMOps ops = op.identify().verbose();

convertCmd.run(ops);

}

There is another cool thing called MSL. Stands for Magick Scripting Language basically XML language, intends for those who want to accomplish custom image processing tasks without programming. The interpreter is called conjure. The scripts looks as typical XML file with specialized tags in it and file extension msl.

An example of MSL:

    <?xml version="1.0" encoding="UTF-8"?>
    <image size="116x28" >
      <read filename="imade_art2.jpg" />
      <get width="base-width" height="base-height" />
      <resize geometry="%[dimensions]" />
      <get width="width" height="height" />
      <print output=
        "Image sized from %[base-width]x%[base-height]
         to %[width]x%[height].\n" />
      <write filename="imade_art2.png" />
    </image>

To invoke this script:

conjure -dimensions 116x28 firstMSL.msl

Magick Scripting Language (MSL) defines the following elements and their attributes:

tag/element

Attribute description/option(s)

<image>

Define a new image object. </image> – Destroys it.

<group>

Defines a new group of image objects. By default, images are only valid for the life of their <image> element. However, in a group, all images in that group will stay around for the life of the group.

<read>

Reads a new image from the disk.

<write>

Writes the image(s) to disk, either as single or multiple ones if necessary.

<get>

Gets any recognized attribute and stores it as an image attribute for later use. Currently only width and height are supported.

<set>

Sets background, bordercolor, clip-mask, colorspace, density, magick, mattecolor and opacity.

<border>

Surrounds the image with a border color. Options: fill, geometry, height, width

<blur>

Reduces image noise and reduces detail levels. Options: radius, sigma

<charcoal>

Simulate a charcoal drawing. Options: radius, sigma

<chop>

Removes pixels from the interior of an image. Options: geometry, height, width, x, y

<crop>

Cuts out one or more rectangular regions of the image. Options: geometry, height, width, x, y

<despeckle>

Remove “pepper” from an image

<emboss>

Replaces each pixel of an image by a highlight or a shadow, depending on light/dark boundaries on the original image.

<enhance>

Removes blurring and noise, increases contrast and reveals details.

<equalize>

Applies a histogram equalization to the image

<flip>

Creates a mirror image, reflecting the scanlines in the vertical direction.

<flop>

Creates a mirror image, reflecting the scanlines in the horizontal direction.

<frame>

Surrounds the image with a border or beveled frame. Options: fill, geometry, height, width, x, y, inner, outer

<get>

Options: height, width

<magnify>

Scales the image to twice its size

<minify>

Scales the image to half its size

<normalize>

Enhances the contrast of a color image

<read>

Reads the input image

<resize>

Resizes an image. Options: blur, filter, geometry, height, width

<roll>

Rolls an image vertically or horizontally. Options: geometry, x, y

<rotate>

Applies Paeth image rotation. Options: degrees

<sample>

Changes the image size simply by directly sampling the pixels of original image. Options: geometry, height, width

<scale>

Changes the image size by replacing pixels by averaging pixels together when minifying or replacing pixels when magnifying. Options: geometry, height, width

<sharpen>

Uses a Gaussian operator of the given radius and standard deviation (sigma). Options: radius, sigma

<shave>

Removes pixels from the image edges. Options: geometry, height, width

<solarize>

Negates all pixels above the threshold level. Options: threshold

<spread>

Displaces image pixels by a random amount. Options: radius

<stegano>

Hides watermark within an image. Options: image

<stereo>

Generates stereogram of two images (one for each eye). Options: image

<swirl>

Swirls image pixels about the center. Options: degrees

<texture>

Tiles texture onto the image background. Options: image

<threshold>

Applies simultaneous black/white threshold to the image. Options: threshold

<transparent>

Makes [this] color transparent within the image. Options: color

<trim>

Removes any edges that are exactly the same color as the corner pixels.

In this short tutorial I could not include all ImageMagick utilities, so you welcome check them out by yourself.

Tips

  1. During using tesseract I’ve been making wrong decisions. One of them was using Cygwin. I wasted about two days trying to compile the sources, adding more and more missed libraries, recompiling again. Finally, I got my executables. However some of the features still were not working. Having decided remove all cygwin “mess”, and installed Microsoft Visual studio 2008 Express solved my troubles. It took only about 2 hours including installation of MS VS and compiling entire solution. It worked as a charm!

  2. The next challenge was to install the ImageMagick. I thought that having MS VS 2008 installed I wouldn’t have problems compiling the sources. I was wrong again. The ImageMagick has dependencies on MSF library and cannot be compiled using MS VS 2008 Express, i.e. this library is out. The other option was to install ImageMagick binary distro. And it worked. Or you can install Visual Studio 6. It’s up to you.

  3. Before using tesseract I encourage you to read its FAQ and Wiki. If you have some question(s) subscribe to the tesseract mailing-list. There are excellent people that can help you. Unlike opening tickets requesting support and waiting days or even months, here help comes very quick.

Conclusion

Usually OCR contains two stages. In the first stage we prepare our data to be processed. Some images have a noise, others poorly scanned or their format do not fit to our purposes. ImageMagick helps us to perform such kind of preparation aiming create scripts that automate the process. In the second stage, we actually do an OCR. Tesseract has a baseapi that make easier to integrate its capabilities with an environment.

Building your system, keep in mind OCR’s limitations.

If you have comments/suggestions please share it with me and other people.

Have a fun!

previous postngs archive

December 12, 2012 Leave a comment

previous postngs archive

Here is some stuff a little bit outdated. Just for the record.

Categories: Uncategorized Tags: , ,