Handwritten Chinese character recognition is difficult because of the following reasons. Firstly, there are many writing variations. Secondly, the character set is huge, and therefore, it contains many similar characters. Thirdly, many Chinese characters are very complex in shape. The degradation of the character image makes it worse. In order to overcome the above difficulties, this dissertation proposes a handwritten Chinese character recognition system that is distinguished from conventional systems in character structure modeling, preprocessing, and discriminating similar characters.
The proposed character model represents the character structure with a statistical framework. Each stroke is modeled by a distribution of pixels, while the stroke relation is reflected by the statistical dependency among the strokes. Based on such a representation, it automatically selects important relations among all possible stroke relations. It is not only effective to tolerate writing variations but also concrete in formulation. Especially, it is outstanding in extracting and representing various kinds of stroke neighbor relations such as cross, T-junction, parallelism, etc.
The preprocessor of the proposed system detects the degraded region and processes it differently from the clean region, in order to cope with the stroke touching and the image blurring. It extracts pseudo strokes from the degraded region, while it extracts normal strokes from the clean region. A pseudo stroke is a line segment that is not certain but likely to be a stroke. The matching algorithm tries to match the normal strokes as many as possible. In contrast, the pseudo stroke participates the matching only if it is profitable to the matching.
In order to discriminate similar characters, the proposed system performs a pair-wise discrimination in postprocessing. If the recognition result belongs to one of pre-defined confusion pairs, the pair-wise discriminator verifies it against its competitor. The...