How Spring Integration can alleviate your life.
Some time ago I began a new job in big corporation. My first task was re-implement / re-import their C# tcp client to Java’s. Existed convertors have been sucking, so I did it manually. After week or so, freshy Java tcp client & server simulator have written & waited for further use. Having met with client’s requirements we found that Java’s implementation has a lack of important features such as: fail-over & auto-reconnection. Adding such functionality required from us add some untested code and might be insufficient flows in the business logic. One of our guys said, Aha, what if …? We can replace Java’s implementation to another one, for instance – Spring Integration. The rest of us smiled thinking what the heck? Anyway, my was is a good champ trying to take best technologies ever existed. We got a green light to do research & learn something exciting. To simplify our requirements I am going to show a simulator (aka server) & a client.
Before delving deeper, let me explain what Spring Integration intended for. As their site suggests: “it provides an extension of the Spring programming model to support the well-known Enterprise Integration Patterns”. Rephrasing, to design good enterprise application one could use a messaging more precisely asynchronous messaging that enables diverse applications to be integrated each other without nightmare or pain. One of wise guys named Martin Fowler has written famous book “Enterprise Integration Patterns”. Folk from Spring probably one day decided to materialize a theory in practice. Very pragmatic approach, isn’t it? Later you will see how wonderful fits for regular tasks. The main concept of SI is: Endpoint, Channel & Message.
Endpoint is a component which actually does something with a message. A message is a container consisting of header & payload. The header contains data that’s relevant to the messaging system where the payload contains the actual data. Channel connects two or more endpoints, it’s similar to Unix’ pipes. Two endpoints can exchange messages iff they’re connected through a channel. Pretty easy, isn’t it? The following diagram shows this.
The next step to our crash course will be defining requirements. I would say, we need a server (a tcp) & tcp client. We will write a simple application that will exchange a couple of messages with each other.
Important thing using SI is a configuration file which contains all necessary components that we going to use. Here is a “server” part of the configuration. Simplifying a model & SI lifecycle, Spring creates objects that defined in configuration xml. More generally such a concept called declarative programming. You define a business object in the xml, and a framework will generate appropriate classes for you, injects and initializes dependencies. The mantra says: you should be concentrated only on business and not on implementation.
Let’s define a part of the configuration xml, the server part.
<int-ip:tcp-connection-factory id="tcpServerFactory"
type="server"
port="23234"
single-use="false"
serializer="byteArrayLenSerializer"
deserializer="byteArrayLenSerializer" />
<int-ip:tcp-inbound-channel-adapter channel="serverIn"
connection-factory="tcpServerFactory"/>
<int-ip:tcp-outbound-channel-adapter channel=”serverOut”
connection-factory=”tcpServerFactory”/>
Important things are: i. A factory (tcp-connection-factory) – creates tcp server using byte array length serializer. A serializer is needed for “packaging” our message by some way or encode it in order to transmit it over a wire. On the other hand deserializer is needed for “unpackaging” our message or decode it. Spring Integration has two factories one for client & another – for the server. The difference is – by type [server or client]. A port – means to listen to for incoming messages. IP address not mentioned here because a server runs as a localhost.
We also defined two channels: serverIn (for incoming messages) & serverOut (for outgoing messages). In order our server will send & receive messages we define inbound & outbound adapter which are associated with factory & channels. In our case it defines the endpoints. So, when message comes somewhat should take care of it. This responsibility takes a service, i.e. file sender service. If it accepts a message afterwards will send in background a file, line by line to the client. Basically, when a server starts, it listens for incoming messages however only specific message will be accepted and if that message is gotten, than server sends line by line a file. If an error occurs it’s routed to the error channel. It’s done using interceptor.
I would say a couple of words about SI lifecycle. Spring framework has two “main” packages: org.springframework.beans & org.springframework.context that builds up the core utility of the dependency injection of the component. The org.springframework.beans.factory.BeanFactory interface provide a basic lifecycle methods (start & stop) for bean initialization/destruction. The org.springframework.context.ApplicationContext offers AOP integration, message resource handling and even more.
Our server is ready, I mean, completely ready. To run the example follow the below steps:
- cd /tcpserver
- mvn clean install
- mvn dependency:copy-dependencies
- mvn exec:java -Dexec.mainClass=”org.example.tcpserver.ServerRunner” -Dexec.args=”–file=”/file_to_be_sent.txt””
Our main class expresses as follows:
CommandLinePropertySource clps = processProperties(args);
/* Spring Integration context used to get desirable beans. */
AbstractApplicationContext context = new ClassPathXmlApplicationContext(new String[] {"server-config.xml"}, false);
context.getEnvironment().getPropertySources().addFirst(clps);
context.refresh();
context.registerShutdownHook();
The source code can be found here http://pastebin.com/6PMpWTfX.
Also we define a file send service:
String key = new String(appropriateData, "UTF-8");
LOG.info("got.message" + " [" + key + "]");
/* If message accepted */
if (key.contains(SEARCH_KEY)) {
LogReader lr = new LogReader(sender, msg);
lr.setPath2File(getFile().getAbsolutePath());
es.execute(lr);
}
http://pastebin.com/icHRdQS3
Next, denote a business runner:
/* Creates an input stream to be read. */
fstream = new FileInputStream(getPath2File());
/* Wraps an input stream in order to be able reading of a whole line */
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
while ((line = br.readLine()) != null) {
command = line;
sendAndLog(timeToWait);
}
http://pastebin.com/LZRdZ3Tg
Finally, for the server write an error handler which logs the errors:
public void handleRequestMessage(byte[] payload) {
LOG.debug("Server got an error " + new String(payload));
}
Until now we’ve done with our server :-).
Now, let’s define a tcp client which will connect to the server, sends an accept message & gets a file sent from the server.
Our configuration file looks as follows:
<!-- Wraps a service with two reply-request channels. -->
<int:gateway id="client"
service-interface="org.example.tcpclient.TcpClientService"
default-reply-channel="replyChannel"
default-request-channel="requestChannel"
default-reply-timeout="1000"
default-request-timeout="1000">
</int:gateway>
<!-- Request channel -->
<int:channel id="requestChannel">
<int:queue capacity="10" />
</int:channel>
<!-- Direct channel used for reply. -->
<int:channel id="replyChannel" />
Here how to run a client:
- Open a new terminal
- cd /tcpclient
- mvn clean install
- mvn dependency:copy-dependencies
- mvn exec:java -Dexec.mainClass=”org.example.tcpclient.ClientTcp”
Almost the same logic expresses here. Have a look.
A main class has the following lines:
/* Spring Integration context used to get desirable beans. */
AbstractApplicationContext context = new ClassPathXmlApplicationContext(
new String[] { "client-config.xml" }, false);
context.refresh();
context.registerShutdownHook();
TcpClientService service = context.getBean("client", TcpClientService.class);
service.send("GIMMY");
http://pastebin.com/9mjmRyNk
In addition, define a client service:
void send(String txt);
Next, a message handler:
public void handle(byte[] s) {
String ss = new String(s);
LOG.info("r:" + ss);
}
http://pastebin.com/Wg4mscvk
And the last one is an interceptor, which will inform your application about:
i. Message sent;
ii. A connection closed;
iii. A new connection added.
public void send(Message<?> message) throws Exception {
super.send(message);
LOG.debug("Sent message [" + new String((byte[]) message.getPayload()) + "]");
}
public void close() {
super.close();
LOG.debug("Closed connection");
}
public void addNewConnection(TcpConnection connection) {
super.addNewConnection(connection);
LOG.debug("Added new connection" + connection.getHostName() + ":" +
connection.getPort());
}
That’s it !!! 🙂
To play with the code, have a look at here http://www.4shared.com/zip/eF4q7l0k/spring_integration_example.html.
Prerequisites:
-
Java 1.6 or above;
-
Maven 3 or above;
-
Desire to learn something new & thrilling;
Pros:
-
A lot of features
-
Tested
-
Good & friendly community
-
If you have questions, the people really quickly reply
-
There are tons of examples
-
API is easy & comprehensive
Cos:
-
Takes time to learn & understand how to work with it.
-
If you got troubles, sometime it is difficult to debug it.
Peace be upon you.
Tika chm extractor – LGPL alternative
Tika chm extractor
I’m pleased to announce that tika chm extractor LGPL licensed is released yesterday. Honestly, it’s not pure LGPL, only libraries it depends on, the rest of the code – Apache license version 2.0.
All relevant information can be found here.
Download the sources go to the github.
Why should it live?
Well, the “original” Tika’s extraction algorithm works pretty well in most of the cases, however, has “difficulties” in rare cases. Inventors of compressed html files by unknown reason couldn’t publish their specification thus the algorithm for extracting context from Tika chm parser is not perfect, but quite good.
Possible solution that crossed everybody’s mind, to use native libraries. Fare enough though. The only one question is in, how to make it working on multiple platforms. Aha! Having checked available options I figured out stable Java library called sevenzipjbind.
The extractor designed as stand alone program. I.e. is a server based on Jetty which listens to HTTP requests. Currently has three options: i. Extracts single file including metadata; ii. Extracts context & metadata from all files in the provided directory; iii. Extracts only metadata from single chm.
In addition, it saves extracted context & its metadata in special folder following the pattern : ../extracted_files/folder_name_as_file_name/extracted html files. Metadata goes under ../extracted_files/file_name.json
Examples how to use it you also can be found on github.
Please don’t hesitate to ask either by replying to this post, contacting me, or by sending a Twitter!
OCR using Tesseract and ImageMagick as pre-processing task
While many applications today use direct data entry via keyboard, more and more of these will return to automated data entry. The reasons for this include the increased incidence of operator wrist problems from constant keying and the potential hazards of video display terminal emissions. Therefore any application imaginable is a candidate for OCR.
What are its Applications?
- Automatic number plate recognition, is used by various police forces and as a method of electronic toll collection on pay-per-use roads, parking, car washing stations etc and cataloging the movements of traffic or individuals (quite popular in Central London).
- Book scanning – digital books can be easily distributed, reproduced and read on-screen. Projects like Project Gutenberg, Google Book Search scan books on a large scale.
- CAPTCHA – is a type pf challenge-response test used in computing as an attempt ensure that the response is not generated by a computer. Stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
- Computational linguistics – machine translation
- Digital pen as well as digital paper
- Digital mail room is an automation of incoming mail processes for classification and distribution of mail.
- Handwriting – is a person’s particular and individual style of writing with pen or pencil. Every literate human has his own manner of writing. Graphology is the controversial study and analysis of handwriting especially in relation to human psychology. Sometimes it’s a part of hiring processing, from the candidate asked to write by hand about its familiar topic and after that send to the authorities for the psycho-analysis of the person.
- Music OCR – intended to interpret sheet music or printed scores into editable and playable form.
- Optical Mark Recognition – is a process of capturing human marked data from document forms such as surveys and tests.
- Kurtzwiel – text-to-speech converter software program, which enables a computer to read electronic and scanned text aloud to visually-impaired people.
Principles of OCR Technology
Optical Character Recognition (OCR) systems may recognize machine print. Using pattern-matching technology, OCR translates the shapes and patterns of machine-made characters into corresponding computer codes. Though most advanced systems are able to recognize multiple fonts, they can process only standard fonts such as Times Roman and Arial. Once all characters in a given word are recognized, the word is compared against a vocabulary of potential answers for the final result.
Character recognition then segments lines of text or words into separate characters that are recognized by the makeup of their component shapes. Machine-printed letters are evenly spaced across, and up-and-down, a given page, allowing the OCR system to read the text one character at a time. Segmentation into single characters represents a critical recognition failure point for forms processing organizations, because OCR recognition technology requires high-quality images with excellent contrast, character and clarity. Any text that is less than perfect will cause even the most sophisticated OCR systems to return significant reductions in accuracy when processing degraded images.
How to choose an optimal product?
When discussing what an OCR product to choose, the number of criteria should be considered. What a price you’re ready to pay? What’s a quality of the product? How is it supported? And so on, and so on. Fortunately for us, such a product exists. It’s open source, very good quality, pretty well supported and still alive. It called tesseract-ocr. Why tesseract? Because it’s open source, it’s licensed ASFv2, because it’s one of the best, the support is pretty well via mailing-list, runs on multiple platforms, has wide range of build-in languages, stable and easily integrates with other systems.
This tutorial divided by:
Introduction to tesseract-ocr
Installation of tesseract 3.0.1 for Windows.
Extracting the text
Writing simple tesseract function using baseapi
Writing Java function that extracts text from given image using ProcessBuilder and tesseract.exe
Introduction to ImageMagic
Installation ImageMagic 6.6.9-8 for Windows
Checking the installation
Brief description what’s under the hood, useful command line utilities.
Java API to ImageMagic (http://im4java.sourceforge.net/)
Introduction to MSL
Writing simple MSL script
Tips
Conclusion
bibliography
Introduction to teseract-ocr
As WIKI suggests, in geometry, the tesseract, also called an 8-cell or regular octachoron or cubic prism, is the four-dimensional analog of the cube. The tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of 6 square faces, the hypersurface of the tesseract consists of 8 cubical cells. The tesseract is one of the six convex regular 4-polytopes.
In our case, the Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. Now Google takes care of it.
Tesseract Installation
During this tutorial we will use Windows box with Microsoft Visual Studio 2008 Express installed.
The installation is very simple, takes about 1 hour. You can use Ant script provided for running particular tasks or do it by yourself.
Let’s meet the tutorial requirements.
- Install Microsoft Visual Studio 2008 Express (http://msdn.microsoft.com/en-us/express/future/bb421473)
- Add vcbuild.exe to the classpath
- Install Ant (http://ant.apache.org/bindownload.cgi)
- Install SVN client (http://subversion.apache.org/packages.html)
- Check the Java2SE 1.5/6 installation
Now, we are ready to step in the word of image processing.
Step 1.
Download the tessearct source files and data. You have two options to do it, 1. using svn or 2. using ant script provided.
If you chosen use an Ant, check the following properties first.
- tesseract.dir – a path to the tesseract sources to be download
- tesseract.dir.name – a folder name, i.e. ${tesseract.dir}/${tesseract.dir.name}
Just make sure it exists, or make it yourself. mkdir ….
Type:
ant svn |
Ok, time to go drink a coffee or read the news.
Well, continuing using Ant script, type:
ant build |
If all went good, you will be notified that all 60 projects successfully built.
Tesseract chipped with the following list of trained languages:
- Arabic
- Bulgarian
- Catalan
- Czech
- Chinese simplified
- Chinese traditional
- Danish
- German
- Greek
- English
- Finnish
- French
- Hebrew
- Hindi
- Croatian
- Hungarian
- Indonesian
- Italian
- Japanese
- Korean
- Latvian
- Lithuanian
- Dutch
- Norwegian
- And more
Let’s see what we have inside.
- tesseract – extracts text or characters from the image.Usage: tesseract imagename outputfile -l -psm configfile-l, -psm, configfile are optional. -l means language in ISO 639-3 standard (eng, rus, ell etc). -psm means pagesegmode, the following mode are available:
psm mode Description 0 Orientation and script detection (OSD) only 1 Automatic page segmentation with OSD 2 Automatic page segmentation, but no OSD, or OCR 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes 5 Assume a single uniform block of vertically aligned text 6 Assume a single uniform block of text 7 Treat the image as a single text line 8 Treat the image as a single word 9 Treat the image as a single word in a circle 10 Treat the image as a single character - cntraining – generates a normproto and pffmtable. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). It then appends these samples into a separate file for each character. The name of file is: DirectoryName/FontName/CharName.FeatureTypeName. The DirectoryName can be specified via a command line argument. If not specified, it defaults to the current directory.
- combine_tessdata – creates an unified traineddata file from different files produced by the training process.
Usage Description language_data_path_prefix (e.g. tessdata/eng.) Combines all individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs). The result will be a combined tessdata file lang_code.traineddata -e Extracts individual components from a combined trained data file. For instance, combine_tessdata -e tessdata/ell.traineddata -o Overwrites individual components of the given lang_code.traineddata file. Example:
combine_tessdata -o tessdata/ell.traineddata-uUnpacks all the components to the specified path. For instance,
combine_tessdata -u tessdata/ell.traineddata /home/$USER/temp/ell
- mftraining – Separates training pages into files for each character. Strips from files only the features and there parameters of the feature type mf. Reads in a text file consisting of feature samples from a training page in the following format: FontName CharName NumberOfFeatureTypes(N). The result is a binary file used by the OCR engine.
- unicharset_extractor – Extracts a character/ligature set. Given a list of box files on the command line, generates a file containing an unicharset, a list of all the characters. The file contains the size of the set on the first line, and then one unichar per line.Usage: unicharset_extractor [-D DIRECTORY] FILE…
- wordlist2dawg – Generates a DAWG from word list file. Given a file that contains a list of words (one word per line) and generates the corresponding squished DAWG file.Usage: wordlist2dawg [-t | -l min_len max_len] word_list_file dawg_file unicharset_file
Often, people think that with OCR they can “crack” gotchas.
As example, run the following:
tesseract.exe ..\kor_data\gotcha.tif gotchaOutput.txt -l eng |
For human being it’s easy to recognize what’s written (rondity describe.), however, look at output:
rmdwdescrbe. |
It could not recognize the first word, white space. Only second word recognized perfectly. You can train you OCR be able take care of words like a first one, but that who produces such gotchas will change their algorithm and you fail again. In this case, don’t try harder.
Another example:
tesseract.exe ..\kor_data\fra.arial.g4.tif ..\kor_data\fra_output.txt -l fra |
Observing the output you probably found that extracted text is quite good but not perfect. Some characters misunderstood. To fix that you need “add” these characters to the traineddata. This process well described in tesseract-ocr wiki (http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3).
In addition to batch processing tesseract-ocr makes possible integrate its capabilities with your program/product through basic c++ API. It’s well documented and easy to use. All basicapi sources located in ../api folder.
Here is an example:
#include "baseapi.h" char* run_tesseract(const char* datapath, const char* language, const unsigned char* imagedata, int bytes_per_pixel, int bytes_per_line, int left, int top, int width, int height) { //Starts tesseract. Datapath must be the name of parent dir and must end in '/'. TessBaseAPI::Init(datapath, language); //Recognizes a rectangle from an image and returns the result as a string char* text = TessBaseAPI::TesseractRect(imagedata, bytes_per_pixel, bytes_per_line, left, top, width, height); //Closes down tesseract and frees up all memory TessBaseAPI::End(); return text; } |
Java code using ProcessBuilder looks like:
/** * Returns a text extracted from image * * @param image – File, might be tiff, png or gpeg * @param tesseractPath – where a tesseract executable is located * @param iso639_3Lang – three long character String, for instance, fra * @return extracted text * * @throws IOException * @throws InterruptedException */ publicstatic String getExtractedText(File image, String tesseractPath, String iso639_3Lang) throws IOException, InterruptedException { File outputFile = new File(image.getParentFile(), “output”); StringBuffer buffer = new StringBuffer(); ProcessBuilder pb = new ProcessBuilder(tesseractPath + File.separator + “tesseract”, image.getCanonicalPath(), outputFile.getAbsolutePath(), “-l”, iso639_3Lang); pb.redirectErrorStream(true); Process process = pb.start(); process.waitFor(); BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream(outputFile.getAbsolutePath() + “.txt”), “UTF-8”)); String str; while ((str = in.readLine()) != null) { buffer.append(str).append(System.getProperty(“line.separator”)); } in.close(); new File(outputFile.getAbsolutePath() + “.txt”).delete(); return buffer.toString(); } |
Working with OCR, often you will want to prepare your data (images) before throwing to the OCR. It could be converting image format, increase/decrease an image resolution, reduce image noise. There are a lot of options to achieve that, GIMP (http://www.gimp.org/) – is free, mutual and cute! If you trying to automate the data preparation process look at ImageMagic (http://www.imagemagick.technocozy.com/).
ImageMagic
It can do: detect edges, add noise, capture a screen and more and more. I could not cover them here, however I’m going to cover a relevant part to the our ocr processing.
- Format conversion;
- Transformations;
- Composite – not sure …
- Image identification
- MSL – Magic Scripting Language – not sure
Installation of the ImageMagic
IM supports wide range of platforms, from *Nix to the Windows. I suppose, throughout this tutorial you used Windows and let it be so.
If you use the Ant script provided, run:
ant im.http |
This command will download the windows installer. The Ant properties are in build.properties files, change them according to your set-up.
Moreover, the MAGICK_HOMEenvironment variable should be set to the path where you previously extracted the ImageMagick files.
Verifying installation
convert logo: logo.miff |
ImageMagick core utilities
Utility name |
Usage |
Display |
Intended to view an image, manage its functionality including load, print, write to file, zoom, copy a region, paste a region, crop, show histogram and even more. |
Convert |
Converts image formats. Can be used for making thumbnails, charcoal drawning, oil painting, morphing |
Import |
Used to capture the screen and writes it to the file. Can be specified a single window, the entire screen, or any portion of the screen |
Animate |
Shows animated formats or a sequence of images. Has a capability for color reduction to match the color resolution of the display. |
Composite |
Combines several separate images with the following schemes: Over, In, Out, Atop, Xor, Plus, Minus, Difference, Multiply and Bumpmap. |
Montage |
Arranges a group of images into a single image. |
Mogrity |
Applies transformations on images and unlike other utilities overwrites the result on the original image. |
Conjure |
Magick Scripting Language (MSL), XML-based language using Conjure to perform any image processing activity without Perl interpreter. |
Identity |
Detects more information about an image format, such as file size, width, height, mapped color, number of colors and can detect if an image is corrupted. |
ImageMagick has unbelievable number of interfaces, you can choose whatever you want. In this tutorial we will use Java API – im4java (http://im4java.sourceforge.net/).
Convert usage, options and image operators
Usage: convert.exe [options …] file [ [options …] file …] [options …] file
Options – Image Settings:
adjoin |
joins images into a single multi-image file |
-affine matrix |
affine transform matrix |
-alpha option |
activates, deactivates, resets, or sets the alpha channel |
-antialias |
removes pixel-aliasing |
-authenticate password |
deciphers image with this password |
-attenuate value |
lessens (or intensify) when adding noise to an image |
-background color |
background color |
-bias value |
adds bias when convolving an image |
-black-point-compensation |
uses black point compensation |
-blue-primary point |
chromaticity blue primary point |
-bordercolor color |
border color |
-caption string |
assigns a caption to an image |
-channel type |
applies option to select image channels |
-colors value |
preferred number of colors in the image |
-colorspace type |
alternates image colorspace |
-comment string |
annotates image with comment |
-compose operator |
sets image composite operator |
-compress type |
type of pixel compression when writing the image |
-define format:option |
defines one or more image format options |
-delay value |
displays the next image after pausing |
-density geometry |
horizontal and vertical density of the image |
-depth value |
image depth |
-direction type |
renders text right-to-left or left-to-right |
-display server |
gets image or font from this X server |
-dispose method |
layers disposal method |
-dither method |
applies error diffusion to image |
-encoding type |
text encoding type |
-endian type |
endianness (MSB or LSB) of the image |
-family name |
renders text with this font family |
-fill color |
color to use when filling a graphic primitive |
-filter type |
uses this filter when resizing an image |
-font name |
renders text with this font |
-format “string” |
output formatted image characteristics |
-fuzz distance |
colors within this distance are considered equal |
-gravity type |
horizontal and vertical text placement |
-green-primary point |
chromaticity green primary point |
-intent type |
type of rendering intent when managing the image color |
-interlace type |
type of image interlacing scheme |
-interline-spacing value |
sets the space between two text lines |
-interpolate method |
pixel color interpolation method |
-interword-spacing value |
sets the space between two words |
-kerning value |
sets the space between two letters |
-label string |
assigns a label to an image |
-limit type value |
pixel cache resource limit |
-loop iterations |
adds Netscape loop extension to your GIF animation |
-mask filename |
associates a mask with the image |
-mattecolor color |
frame color |
-monitor |
Monitors progress |
-orient type |
image orientation |
-page geometry |
size and location of an image canvas (setting) |
-ping |
efficiently determines image attributes |
-pointsize value |
font point size |
-precision value |
maximum number of significant digits to print |
-preview type |
image preview type |
-quality value |
JPEG/MIFF/PNG compression level |
-quiet |
suppresses all warning messages |
-red-primary point |
chromaticity red primary point |
-regard-warnings |
Pays attention to warning messages |
-remap filename |
Transforms image colors to match this set of colors |
-respect-parentheses |
settings remain in effect until parenthesis boundary |
-sampling-factor geometry |
horizontal and vertical sampling factor |
-scene value |
image scene number |
-seed value |
Seeds a new sequence of pseudo-random numbers |
-size geometry |
width and height of image |
-stretch type |
renders text with this font stretch |
-stroke color |
graphic primitive stroke color |
-strokewidth value |
graphic primitive stroke width |
-style type |
Renders text with this font style |
-synchronize |
synchronize image to storage device |
-taint |
Declares the image as modified |
-texture filename |
name of texture to tile onto the image background |
-tile-offset geometry |
tiles offset |
-treedepth value |
color tree depth |
-transparent-color color |
transparent color |
-undercolor color |
annotation bounding box color |
-units type |
the units of image resolution |
-verbose |
prints detailed information about the image |
-view |
FlashPix viewing transforms |
-virtual-pixel method |
virtual pixel access method |
-weight type |
Renders text with this font weight |
-white-point point |
chromaticity white point |
Image Operators:
-adaptive-blur geometry |
adaptively blur pixels; decrease effect near edges |
-adaptive-resize geometry |
adaptively resizes image using ‘mesh’ interpolation |
-alpha option |
on, activate, off, deactivate, set, opaque, copy |
-annotate geometry text |
annotate the image with text |
-auto-gamma |
automagically adjusts gamma level of image |
-auto-level |
automagically adjusts color levels of image |
-auto-orient |
automagically orients (rotates) image |
-bench iterations |
Measures performance |
-black-threshold value |
forces all pixels below the threshold into black |
-blue-shift factor |
Simulates a scene at nighttime in the moonlight |
-blur geometry |
Reduces image noise and reduce detail levels |
-border geometry |
Surrounds image with a border of color |
-border geometry |
Surrounds image with a border of color |
-bordercolor color |
border color |
-brightness-contrast geometry |
improves brightness / contrast of the image |
-cdl filename |
color correct with a color decision list |
-charcoal radius |
Simulates a charcoal drawing |
-chop geometry |
Removes pixels from the image interior |
-clamp |
Restricts pixel range from 0 to the quantum depth |
-clip |
Clips along the first path from the 8BIM profile |
-clip-mask filename |
Associates a clip mask with the image |
-clip-mask filename |
Associates a clip mask with the image |
-clip-path id |
Clips along a named path from the 8BIM profile |
-colorize value |
Colorizes the image with the fill color |
-color-matrix matrix |
Applies color correction to the image |
-contrast |
Enhances or reduce the image contrast |
-contrast-stretch geometry |
Improves contrast by `stretching’ the intensity range |
-convolve coefficients |
Applies a convolution kernel to the image |
-cycle amount |
Cycles the image colormap |
-decipher filename |
converts cipher pixels to plain pixels |
-deskew threshold |
straightens an image |
-despeckle |
Reduces the speckles within an image |
-distort method args |
distort images according to given method ad args |
-draw string |
Annotates the image with a graphic primitive |
-edge radius |
Applies a filter to detect edges in the image |
-encipher filename |
Converts plain pixels to cipher pixels |
-emboss radius |
Embosses an image |
-equalize |
Performs histogram equalization to an image |
-evaluate operator value
|
evaluates an arithmetic, relational, or logical expression |
-extent geometry |
Sets the image size |
-extract geometry |
Extracts area from image |
-fft |
implements the discrete Fourier transform (DFT) |
-flip |
Flips image vertically |
-floodfill geometry color |
Floodfills the image with color |
-flop |
Flops image horizontally |
-frame geometry |
Surrounds image with an ornamental border |
-function name parameters |
Applies function over image values |
-gamma value |
level of gamma correction |
-gaussian-blur geometry |
Reduces image noise and reduce detail levels |
-geometry geometry |
preferred size or location of the image |
-identify |
Identifies the format and characteristics of the image |
-ift |
implements the inverse discrete Fourier transform (DFT) |
-implode amount |
Implodes image pixels about the center |
-lat geometry |
local adaptive thresholding |
-layers method |
optimizes, merges, or compares image layers |
-level value |
Adjusts the level of image contrast |
-level-colors color,color |
Levels image with the given colors |
-linear-stretch geometry |
Improves contrast by `stretching with saturation’ |
-liquid-rescale geometry |
Rescales image with seam-carving |
-median geometry |
Applies a median filter to the image |
-mode geometry |
Makes each pixel the ‘predominate color’ of the neighborhood |
-modulate value |
Varies the brightness, saturation, and hue |
-monochrome |
transforms image to black and white |
-morphology method kernel |
Applies a morphology method to the image |
-motion-blur geometry |
Simulates motion blur |
-negate |
Replaces every pixel with its complementary color |
-noise geometry |
adds or reduces noise in an image |
-normalize |
Transforms image to span the full range of colors |
-opaque color |
Changes this color to the fill color |
-ordered-dither NxN |
Adds a noise pattern to the image with specific amplitudes |
-paint radius |
Simulates an oil painting |
-polaroid angle |
Simulates a Polaroid picture |
-posterize levels |
Reduces the image to a limited number of color levels |
-profile filename |
adds, deletes, or applies an image profile |
-quantize colorspace |
Reduces colors in this colorspace |
-radial-blur angle |
radial blurs the image |
-raise value |
Lightens/darkens image edges to create a 3-D effect |
-random-threshold low,high |
random thresholds the image |
-region geometry |
Applies options to a portion of the image |
-render |
Renders vector graphics |
-repage geometry |
size and location of an image canvas |
-resample geometry |
Changes the resolution of an image |
-resize geometry |
Resizes the image |
-roll geometry |
Rolls an image vertically or horizontally |
-rotate degrees |
Applies Paeth rotation to the image |
-sample geometry |
Scales image with pixel sampling |
-scale geometry |
Scales the image |
-segment values |
Segments an image |
-selective-blur geometry |
selectively blurs pixels within a contrast threshold |
-sepia-tone threshold |
simulates a sepia-toned photo |
-set property value |
Sets an image property |
-shade degrees |
Shades the image using a distant light source |
-shadow geometry |
Simulates an image shadow |
-sharpen geometry |
Sharpens the image |
-shave geometry |
Shaves pixels from the image edges |
-shear geometry |
Slides one edge of the image along the X or Y axis |
-sigmoidal-contrast geometry
|
Increases the contrast without saturating highlights or shadows |
-sketch geometry |
Simulates a pencil sketch |
-solarize threshold |
Negates all pixels above the threshold level |
-sparse-color method args |
fills in a image based on a few color points |
-statistic type geometry
|
Replaces each pixel with corresponding statistic from the neighborhood |
-strip |
Strips image of all profiles and comments |
-swirl degrees |
Swirls image pixels about the center |
-threshold value |
Thresholds the image |
-thumbnail geometry |
Creates a thumbnail of the image |
-tile filename |
Tiles image when filling a graphic primitive |
-tint value |
Tints the image with the fill color |
-transform |
affine transforms image |
-transpose |
Flips image vertically and rotate 90 degrees |
-transverse |
Flops image horizontally and rotate 270 degrees |
-trim |
Trims image edges |
-type type |
image type |
-unique-colors |
Discards all but one of any pixel color |
-unsharp geometry |
Sharpens the image |
-vignette geometry |
Softens the edges of the image in vignette style |
-wave geometry |
Alters an image along a sine wave |
-white-threshold value |
force all pixels above the threshold into white |
Image Sequence Operators:
-append |
Appends an image sequence |
-clut |
Applies a color lookup table to the image |
-coalesce |
Merges a sequence of images |
-combine |
Combines a sequence of images |
-composite |
Composites image |
-crop geometry |
Cuts out a rectangular region of the image |
-deconstruct |
Breaks down an image sequence into constituent parts |
-evaluate-sequence operator |
Evaluates an arithmetic, relational, or logical expression |
-flatten |
Flattens a sequence of images |
-fx expression |
Applies mathematical expression to an image channel(s) |
-hald-clut |
Applies a Hald color lookup table to the image |
-morph value |
Morphs an image sequence |
-mosaic |
Creates a mosaic from an image sequence |
-print string |
Interprets string and print to console |
-process arguments |
Processes the image with a custom image filter |
-separate |
Separates an image channel into a grayscale image |
-smush geometry |
Smashes an image sequence together |
-write filename |
Writes images to this file |
Image Stack Operators:
-clone indexes |
Clones an image |
-delete indexes |
Deletes the image from the image sequence |
-duplicate count,indexes
|
Duplicates an image one or more times |
-insert index |
Inserts last image into the image sequence |
-reverse |
Reverses image sequence |
-swap indexes |
Swaps two images in the image sequence |
Miscellaneous Options:
-debug events |
Displays copious debugging information |
-help |
Prints program options |
-list type |
Prints a list of supported option arguments |
-log format |
Formats of debugging information |
-version |
Prints version information |
Here is a java code that converts image from jpeg to tiff.
publicstaticvoid main(String[] args) throws IOException, InterruptedException, IM4JavaException { String searchPath = “E:/image_magick”; String sourceImage = “data/imade_art2.jpg”; String destImage = “data/imade_art2.tiff”; IMConvertCmd.tryExample(searchPath, sourceImage, destImage); } /** * Creates ConvertCmd, sets search path, sets command, runs convert command, * creates IMOperation, adds to it an image, runs identify and verbose commands * * @param searchPath – where ImageMagic exe’s placed * @param sourceImage – a source image * @param destImage – a destination image to be converted * * @throws IOException * @throws InterruptedException * @throws IM4JavaException */ publicstaticvoid tryExample(String searchPath, String sourceImage, String destImage) throws IOException, InterruptedException, IM4JavaException { ConvertCmd convertCmd = new ConvertCmd(); convertCmd.setSearchPath(searchPath); convertCmd.setCommand(sourceImage, destImage); convertCmd.run(new IMOperation()); IMOperation op = new IMOperation(); op.addImage(destImage); IMOps ops = op.identify().verbose(); convertCmd.run(ops); } |
There is another cool thing called MSL. Stands for Magick Scripting Language basically XML language, intends for those who want to accomplish custom image processing tasks without programming. The interpreter is called conjure. The scripts looks as typical XML file with specialized tags in it and file extension msl.
An example of MSL:
<?xml version="1.0" encoding="UTF-8"?> <image size="116x28" > <read filename="imade_art2.jpg" /> <get width="base-width" height="base-height" /> <resize geometry="%[dimensions]" /> <get width="width" height="height" /> <print output= "Image sized from %[base-width]x%[base-height] to %[width]x%[height].\n" /> <write filename="imade_art2.png" /> </image> |
To invoke this script:
conjure -dimensions 116x28 firstMSL.msl
|
Magick Scripting Language (MSL) defines the following elements and their attributes:
tag/element |
Attribute description/option(s) |
<image> |
Define a new image object. </image> – Destroys it. |
<group> |
Defines a new group of image objects. By default, images are only valid for the life of their <image> element. However, in a group, all images in that group will stay around for the life of the group. |
<read> |
Reads a new image from the disk. |
<write> |
Writes the image(s) to disk, either as single or multiple ones if necessary. |
<get> |
Gets any recognized attribute and stores it as an image attribute for later use. Currently only width and height are supported. |
<set> |
Sets background, bordercolor, clip-mask, colorspace, density, magick, mattecolor and opacity. |
<border> |
Surrounds the image with a border color. Options: fill, geometry, height, width |
<blur> |
Reduces image noise and reduces detail levels. Options: radius, sigma |
<charcoal> |
Simulate a charcoal drawing. Options: radius, sigma |
<chop> |
Removes pixels from the interior of an image. Options: geometry, height, width, x, y |
<crop> |
Cuts out one or more rectangular regions of the image. Options: geometry, height, width, x, y |
<despeckle> |
Remove “pepper” from an image |
<emboss> |
Replaces each pixel of an image by a highlight or a shadow, depending on light/dark boundaries on the original image. |
<enhance> |
Removes blurring and noise, increases contrast and reveals details. |
<equalize> |
Applies a histogram equalization to the image |
<flip> |
Creates a mirror image, reflecting the scanlines in the vertical direction. |
<flop> |
Creates a mirror image, reflecting the scanlines in the horizontal direction. |
<frame> |
Surrounds the image with a border or beveled frame. Options: fill, geometry, height, width, x, y, inner, outer |
<get> |
Options: height, width |
<magnify> |
Scales the image to twice its size |
<minify> |
Scales the image to half its size |
<normalize> |
Enhances the contrast of a color image |
<read> |
Reads the input image |
<resize> |
Resizes an image. Options: blur, filter, geometry, height, width |
<roll> |
Rolls an image vertically or horizontally. Options: geometry, x, y |
<rotate> |
Applies Paeth image rotation. Options: degrees |
<sample> |
Changes the image size simply by directly sampling the pixels of original image. Options: geometry, height, width |
<scale> |
Changes the image size by replacing pixels by averaging pixels together when minifying or replacing pixels when magnifying. Options: geometry, height, width |
<sharpen> |
Uses a Gaussian operator of the given radius and standard deviation (sigma). Options: radius, sigma |
<shave> |
Removes pixels from the image edges. Options: geometry, height, width |
<solarize> |
Negates all pixels above the threshold level. Options: threshold |
<spread> |
Displaces image pixels by a random amount. Options: radius |
<stegano> |
Hides watermark within an image. Options: image |
<stereo> |
Generates stereogram of two images (one for each eye). Options: image |
<swirl> |
Swirls image pixels about the center. Options: degrees |
<texture> |
Tiles texture onto the image background. Options: image |
<threshold> |
Applies simultaneous black/white threshold to the image. Options: threshold |
<transparent> |
Makes [this] color transparent within the image. Options: color |
<trim> |
Removes any edges that are exactly the same color as the corner pixels. |
In this short tutorial I could not include all ImageMagick utilities, so you welcome check them out by yourself.
Tips
-
During using tesseract I’ve been making wrong decisions. One of them was using Cygwin. I wasted about two days trying to compile the sources, adding more and more missed libraries, recompiling again. Finally, I got my executables. However some of the features still were not working. Having decided remove all cygwin “mess”, and installed Microsoft Visual studio 2008 Express solved my troubles. It took only about 2 hours including installation of MS VS and compiling entire solution. It worked as a charm!
-
The next challenge was to install the ImageMagick. I thought that having MS VS 2008 installed I wouldn’t have problems compiling the sources. I was wrong again. The ImageMagick has dependencies on MSF library and cannot be compiled using MS VS 2008 Express, i.e. this library is out. The other option was to install ImageMagick binary distro. And it worked. Or you can install Visual Studio 6. It’s up to you.
-
Before using tesseract I encourage you to read its FAQ and Wiki. If you have some question(s) subscribe to the tesseract mailing-list. There are excellent people that can help you. Unlike opening tickets requesting support and waiting days or even months, here help comes very quick.
Conclusion
Usually OCR contains two stages. In the first stage we prepare our data to be processed. Some images have a noise, others poorly scanned or their format do not fit to our purposes. ImageMagick helps us to perform such kind of preparation aiming create scripts that automate the process. In the second stage, we actually do an OCR. Tesseract has a baseapi that make easier to integrate its capabilities with an environment.
Building your system, keep in mind OCR’s limitations.
If you have comments/suggestions please share it with me and other people.
Have a fun!
previous postngs archive
Here is some stuff a little bit outdated. Just for the record.