The state-of-the-art video coding standard H.264/AVC uses the transform coding to compress video data in spatial domain. Although its complexity isn’t quite high due to integer-based arithmetic, the throughput requirement comes to increase because the H.264/AVC encoder uses ABT(Adaptive Block-Size Transform) to improve encoding performance. For real-time processing of such transform coding, this thesis proposed a high-throughput, cost-effective implementation of six different integer transforms in the H.264/AVC high-profile coders, i.e., $4\times4$ forward, $4\times4$ inverse, forward Hadamard, inverse Hadamard, $8\times8$ forward, and $8\times8$ inverse transform, all integrated as a shared hardware.
At first, the $4\times4$ multi-transform architecture which can process one of four $4\times4$ transform types within two clock cycles is proposed. The $4\times4$ transform matrices are regularized by using permutation, partitioned into $2\times2$ blocks, and factored for maximal hardware sharing between two different phases within each transform as well as among four different $4\times4$ transforms.
Secondly, the multi-transform architecture which can process any type of six different transform types is proposed. By using two types of $4\times4$ transform matrices included in a $8\times8$ transform matrix, two different $8\times8$ transforms are both described as three steps and unified with minor modification. To improve throughput of the transform, two independent $4\times4$ transform blocks within the $8\times8$ transform block operate in parallel in the $4\times4$ transform mode, while the two-stage pipelined architecture is used in the $8\times8$ transform mode.
Experimental results shows that the proposed transform has the same coding performance in terms of bitrate and PSNR as the transform in H.264/AVC reference software. Hardware implementation results show that the maximum operating frequency of the proposed multi-transform architecture is 200 MH...