Instruction pipelining is essential to enhance the processor performance in modern RISC microprocessors including superscalar processors. Deeper pipeline generally yields finer clock step and shorter CPU cycle time and, therefore, enhances the CPU performance significantly as long as the normal pipeline flow is maintained. However, as the pipeline depth is increased for higher throughput, the branch penalty is also increased, which can severely reduce the performance advantage due to the deeper pipeline. There were several approaches to solve the long branch penalty problem in both static and dynamic approaches. When only static approaches such as delayed branch and squashed branch are used, the hardware cost can be reduced but only at the cost of long branch execution cycles. On the other hand, dynamic approaches reduce the branch execution cycles but requires significant hardware overhead, as exemplified by such hardware schemes as branch target buffer (BTB) and branch folding. In this thesis, a new hardware scheme called {\it dynamic rescheduling squashed branch(DRSB)} to reduce the branch execution cycles is proposed. A conventional BTB scheme enables the processor to fetch newly-predicted branch target instructions during the branch delay cycles by employing a separate branch cache which stores the PCs of the executed branch instructions. The proposed DRSB scheme employs a {\it rescheduling buffer} in which newly-predicted target instructions are dynamically rescheduled, therefore, enables processors to fetch the target instruction sequentially during branch delay cycles. Performance of the DRSB scheme was evaluated using a trace-driven branch simulator with the estimation of silicon area using 0.8 $\mu m$ CMOS standard cell library. DRSB scheme reduces the silicon area of branch unit into $\frac{1}{2.911}$ while reducing branch execution cycles by 0.4% as compared with conventional BTB scheme whose branch cache size is 1 k-bytes. The proposed DRSB scheme ...