In OCR libraries, the following algorithms are used.
New algorithms introduced from 2009 to 2016

  1. Discriminant analysis
    Used when binarizing grayscale images. It calculates the threshold value that maximizes inter-class variance between black and white areas and minimizes intra-class variance.
  2. Fast labeling algorithm
    Extracts and labels shapes from monochrome raster images. Clustering the labeled shapes based on parameters such as distance allows for extraction at the paragraph, line, and character levels using the labeling results.
  3. KL transform
    Compresses hundreds-dimensional feature vectors into lower-dimensional principal components to improve recognition rates and speed, as well as save necessary resources (such as memory and dictionary size).
  4. LBG clustering
    Determines the pitch of half-width and full-width characters statistically using a clustering algorithm to obtain the most optimal solution. Also used for determining line widths within paragraphs.
  5. Nonlinear normalization
    Normalizes character images in a nonlinear manner to achieve a balanced stroke distribution. This allows for accommodating differences in stroke positions within characters to a certain extent.
  6. Thinning
    Performs feature calculation on thinned character images. This allows for accommodating differences in stroke width to a certain extent.
  7. Viterbi algorithm
    When recognizing text lines with variable pitch, the Viterbi algorithm is used as a combination optimization algorithm to correctly extract characters from half-width and full-width characters, and from characters divided into left and right parts (e.g., "い" and "小"). Also used for elements such as crowns and feet, "う," and "δΈ‰" in vertical writing.
  8. n-gram co-occurrence frequency calculation
    Extracts frequency statistics of character string co-occurrence (frequency of 1-character, 2-character...n-character sequences) from vast plain text files and uses them to correct recognition results.
  9. Chart parser
    When recognizing addresses, email addresses, URLs, phone numbers (including fax numbers), etc., a chart parser is used for syntax analysis.