Batch
preparation
Before
scanning, an operator manually prepares a batch of paper
documents, adds document separator sheets if required,
counts the number of pages in the batch, and loads the
batch into the scanner input tray.
Batch
creation
The
operator at the scan workstation creates the batch by
entering the appropriate information into the system,
including the batch name, document class, and various page
counts used for error checking. The page counts are used
to detect when an incorrect number of pages are scanned
and if document separators are not used, to define the
beginning and end of each document in the batch.
Scan
To
start the scan, the operator clicks the Start button in
the capture software. To start an import operation, the
operator selects the image filename(s) from a list box and
then clicks the Start button. Both scanned and imported
images can be contained in the same document or batch. The
Scan module displays the scanned and/or imported images in
the view window, maintains and displays the document and
page counts, and stores the images in a temporary working
directory. The Scan module also processes any bar codes
and patch codes on the scanned images. Bar codes are used
to automatically fill in index fields associated with the
document. Patch codes are used on document separator
sheets to indicate the beginning of a document.
Batch
close
After
the entire batch is successfully scanned or imported, the
scan operator closes the batch. This automatically sends
the batch to the next queue specified in the setup.
Image
Cleanup and Optical Character Recognition (OCR)
If
a document has well defined fields (for example, a form),
it is possible to speed indexing by using OCR to read
zones on the document and automatically convert them into
indexes. If OCR indexing is specified for a document type,
capture software allows you to specify zones in the
document and associate each zone with an index field.
After the zones have been recognized, the documents can be
sent to an indexing station for verification or can be
sent straight to the next stage of processing.
The
scanning process also supports full text indexing. This
process performs OCR on the entire document and produces
an ASCII file of the output. The output can also be stored
in a variety of word processing formats, including
Microsoft Word and WordPerfect.
Image
Cleanup
As
a rule of thumb, OCR is useful only on clean, sharp images
where the OCR accuracy is 95% or higher. If the OCR
accuracy is less than 95%, the cost of checking and
correcting errors is frequently higher than the cost of
manually keying the index data.
There
are several techniques that can make images more readable
and increase OCR accuracy. The most effective ones
include:
Deskewing
This
technique straightens pages that have been scanned
slightly crooked due to mechanical tolerances in the
scanner’s document feeder. Deskewing can increase the
accuracy of OCR by 5-10% or more which can make the
difference between using expensive manual indexing and
automated OCR indexing.
Deshading
OCR
engines are unable to read words against the gray shaded
backgrounds that are common on forms. Removing shading
allows you to OCR zones that are otherwise unreadable.
Despeckling
and Streak Removal
These
techniques remove small speckles and streaks caused by
dirt in the scanner feeder or scanner noise.
Line
removal
On
typewritten forms, words are frequently typed so that they
cross over the lines on the form, which makes them
unreadable to OCR. Line removal erases the lines on the
image and then reconstructs the characters so they can be
recognized.
Edge
enhancement
The
Edge enhancement function includes a multiple set of
filters that sharpen the edges of characters. The results
are usually invisible to the eye, but they can increase
the accuracy of OCR by as much as 5-10%.
Index/Index
Verify
The
Index module is used to enter index data and associate it
with an imaged document. The index data is then stored in
the RiskPro
database and can be used at a later
time to retrieve the document image from its permanent
storage location. Indexing is the most critical and labor
intensive step in the document capture process, with
typical capture operation sometimes requiring as many as
four index stations for each scanner. The index data is
the key to retrieving the document. Noted below are
several methods used by Blackburn Group, Inc. to reduce
operator errors and speed the indexing process:
OCR can be used to fill index
fields. This allows the index operator to simply check the
accuracy of the OCR field rather than manually typing the
required data on the indexing form.
Bar code recognition can be used
for indexing documents. Bar codes are processed by the
Scan module, and the data is used to fill user-specified
fields on the indexing form. Document capture software
supports most popular bar code types, including Code 39
and Interleaved 2 of 5.
Custom validation scripts can be
configured to fill fields on the indexing form with
default values.
QA/Rescan
No
scanner is perfect, and rescanning is an integral part of
the process. Index operators can easily tag documents or
individual pages for rescan, attaching electronic notes
that tell the scanner operator exactly what the problem
is. The batch is then queued to a rescan workstation where
the operator is prompted for the specific pages or
documents to be rescanned. Document capture software
automatically insert rescanned pages in the appropriate
position within the batch.
The following are typical reasons
why documents are rejected and the batch is sent to the
scan queue for rescanning:
Poorly scanned page (too light, too
dark).
Missing page.
Missing document.
Skewed image.
Illegible bar code or patch code.
The
operator who detected the problem at the Index or Index
Verify queue may attach electronic notes explaining the
problem to the rejected document. Before rescanning, the
rescan operator can open the Note Viewer via the View menu
and read the attached notes.
Image
Management
The
final stage in the capture process is to transfer each
document in the batch either to long term storage or to a
workflow system. In the transfer process, the image files
are written to permanent storage and the indexes are
written to RiskPro
or
a document manager.
Please
call us for a free consultation from a distributor
in your area
regarding a specific risk
management solution for your business.