Carry-save-adder (CSA) is one of the most widely used types of operation in implementing a fast computation of arithmetics. An inherent limitation of the conventional CSA applications is that the applications are confined to the sections of arithmetic circuit that can be directly translated into addition expressions. To overcome this limitation, from the analysis of the structures of arithmetic circuits found in industry, we derive a set of simple, but effective CSA transformation techniques other than the existing ones. Those are 1) optimization across multiplexors, 2) optimization across design boundaries, and 3) optimization across multiplications. Based on the techniques, we develop a new timing-driven CSA transformation algorithm that is able to utilize CSA's extensively throughout the whole circuits. Experimental data for practical testcases are provided to show the effectiveness of our algorithm.