FAQ: OCR and Litigation Coding

OCR (Optical Character Recognition) is a tool used to code huge volumes of legal documents. Whenever we scan any document it is always saved only as an image file and so we cannot edit it. But with OCR it becomes possible to scan a printed document and directly transfer it into word-processing software like MS Word and then edit it. An OCR software application can actually read the black and white pixels on an image and can recognize the correct alpha character or numeric number, and the technique is very useful while coding legal documents.

OCR is thus the mechanical or electronic translation of images of printed text (usually captured by a scanner) into machine-editable text. OCR is a modern technology and it also includes digital image processing. ICR (Intelligent Character Recognition) is a more recent development and converts handwritten text and numbers into machine readable strings and documents. Thus even hand filled applications etc. can now be recognized by machines.

Legal coding is a process wherein cataloging or indexing of documents is done so that it will be easily retrieved, sorted, reviewed, etc. Locating of any specific fact while doing legal research for litigation can be done just within a few seconds even from huge volumes of documents when there is an expertly coded litigation database. This is where the OCR software helps as it can help create a systematic database quickly. In the early days OCR technology could not recognize all the different fonts but now these "Intelligent" systems can read with a high degree of recognition accuracy for most fonts. Some OCR systems are even capable of reproducing formatted output that comes very close to the original scanned page and that can include images, or any other non-textual components.

Some of the other methods of legal coding, besides data capturing using OCR are bibliographic coding, in-text coding, objective and subjective coding. Objective coding is creating an index of objective summary data from a document, and includes information like title, dates, author, recipient etc. Subjective coding is the more reliable method to determine the importance of the legal documents as the indexing of documents is done around subjective data and the work is done by someone who is familiar with the topic.

