The Discrete Cosine Transform(DCT) is considered to be the most effective transform coding technique for image and video compression. In this paper, using a fast DCT algorithm and multiplier-accumulator based, blocks of image data are converted into the transform-domain for more effective coding. An Inverse Discrete Consine Transform(IDCT) is used to convert the transform-domain data back to the spatial domain. An often used block size is $8\times8$ pixels since it represents a good compromise between the coding effiency and the hardware complexity. Becayse of its effectiveness, many proposed standards such as the CCITT H.261 recommended standard for px6 kb/s (p=1,2$\ldots$,30) visual telephony, and the still-image compression standard developed by ISO JPEG all include the use of $8\times8$ DCT in their algorithms. In this paper, a proposed architecture and implementation of a flexible $8\times8$ DCT/IDCT core processor using multiplication architecture rather than distributed arithmetic is presented. Our chip is for experimental prototype purpose and is implemented using standard cells. The new and fast DCT/IDCT algorithms are implemented in the same chip. The internal clock frequency is half of the pixel rate. The chip achieves a better accuracy than the CCITT IDCT specification.