Stroke is one of the most important features in oriental languages processing and recognition. The structure of a character in oriental languages can be characterized by orientation and the order of strokes. Therefore, extracting strokes from input images is one of the first and important steps for any structural character recognition system. There are several re-searches on stroke extraction, but all of them provide only one output. The output may contain incorrect strokes or miss some strokes when there are noises or blurred areas in the input images.
In this work, a two-phase stroke extraction method is proposed. In the first phase, the proposed method utilizes the information of the boundary, the direction of each boundary point to extract clear strokes. This phase is designed to be general enough to be applied in any oriental languages. In the second phase, we propose a mechanism to add some knowledge on the structure of the character set to double check strokes extracted from the first phase and to extract some missing strokes due to blurring. These noisy strokes or missing strokes are quite ambiguous, however, it is nearly impossible to eliminate all noisy strokes and extract all missing strokes for all cases. With the help from top-down knowledge, we produce several alternatives which are most likely to be the correct results. Moreover, we also define a way to give score for each alternative. This will help much as we can use language model to find out the most correct result later.
Experiments are performed with Hangul video text images. These images are captured from video. Therefore, they are low-resolution and have so many noises as well as blurred areas. The experimental results show the advances of our proposed two-phase extraction method in the sense of producing several possible results for low quality images.