Volume rendering is a key technique in visualization of 3D data. However, we need a heavy and complex computational power to achieve real-time volume rendering. The volume rendering is implemented in the Texas Instruments`` TMS320C6201, which has a superscalar and Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight operations in parallel. After analyzing the architecture of TMS320C6201, the optimization process and the performance on TMS320C6201 DSP are described in this thesis.