OCR for TV subtitles

Introduction of commissioned development projects for fiscal year 2016.
We have achieved practical level caption recognition with ideas that hit blind spots.

It is very difficult to improve the accuracy of caption recognition. It is even more difficult for variety shows than news programs.
Many researchers have been studying and developing algorithms to extract only the character parts of captions, but the practical application is still far from reality.

Art Logic has developed a 2-value algorithm for color images, including captions, that can recognize parts of the character parts of captions by combining 14 different algorithms, and supports them using the "even a poor marksman will hit the target with enough shots" method.

Parallel processing based on the number of CPU cores by combining 12 different algorithms to extract characters from color screens.
We have selected as many independent algorithms as possible out of the 12 algorithms.
Inverted characters and italic font are automatically processed by all algorithms.
In each of the 12 algorithms, inverted characters and italic fonts are processed. If inverted characters and italic fonts are processed in a separate thread, it becomes 36 combinations, which is too much.
We added 2 more algorithms for processing outlined characters and performed parallel processing.
Simple 2-value algorithms for color images cannot handle outlined characters. We have developed two dedicated algorithms for outlined characters and added them.

We execute these 14 character recognition processes in parallel. Although the individual process's recognition accuracy is much lower than the conventional caption recognition accuracy, by integrating all the results based on confidence and language information, we can obtain higher accuracy output than conventional caption recognition.

On a PC with about 2 cores and 4HT, it takes about 4 seconds to process one image, which is a disadvantage. To achieve a speed of 1 frame per second, you need a CPU with 8 cores and 16HT. Of course, operation with a 64-bit OS is recommended.

By performing 14 process parallel execution of low recognition rate recognition processing as shown in the figure below and integrating the results, high accuracy is achieved. Due to copyright reasons, only the corners of the screen are quoted. Thank you for your understanding.
Binarization with strong emphasis on low luminance ignoring color components	Binarization with strong emphasis on high luminance ignoring color components

Binarization with yellow and orange colors as character colors	Binarization targeting outlined characters (outlined outline is low luminance)

Binarization using saturation information	General binarization for color documents (not dedicated to captions)