This paper suggests the architecture of the multiplier suitable for Montgomery modular multiplication algorithm, and implements the 1024 bit modular multiplier chip in register transfer level using Verilog HDL. First, the method performing modular multiplication using Montgomery modular multiplication was surveyed. Then the algorithm suitable for the pipelined multiplication was adopted. To get high performance distributed arithmetic is used to reduce the number of summation and exploit the characteristic in which the one operand is fixed during several machine cycles. Also the tree structure with 4-2 compressor is used for fast calculation. The datapath is deep pipelined to get high throughput.