FAQ5

Digitization (General info)

There are two main methods for converting existing print resources into encoded digital text files.

Firstly, there is the brute force approach of simply re-keying the data. In some cases this is actually the most efficient method. For small projects, single keying and careful proofreading is often sufficient. For medium sized projects, double keying used in conjunction with a file comparison program will often yield excellent results. Re-keying the data can also eliminate the need to scan the originals into page image files.

The second method for obtaining digital text files is by taking the digital page images and running them through OCR. Particularly when a large amount of text is involved, OCR can be the best option. There are several commercial OCR programs available which can recognize Slavic languages, both those written in modified Latin and those written in the Cyrillic alphabets.