We present a recognition-based digitization method for building
digital library of large amount of historical archives. Because the most
of archives are manually transcribed in ancient Chinese characters, their
digitization present unique academic and pragmatic challenges. By integrating
the layout analysis and the recognition into single probabilistic
framework, our system achieved 95.1% character recognition rates on
test data set, despite the obsolete characters and unique variants used
in the archives. Compared with intuitive verification and correction interface,
the system freed the operators from repetitive typing tasks and
improved the overall throughput significantly.