Timing and area of circuits are two of the most important design criteria to be optimized in data path synthesis. In addition, carry-save adder(CSA) cell has been proven to be one of the most efficient implementation units in optimizing timing and/or area of arithmetic circuits. However, the existing approaches are restricted in using CSAs, i.e., optimizing each operation tree separately without any interaction between them, leading to a locally optimized resultant CSA circuits. To overcome the limitation, we propose a practically efficient solution to the problem of an accurate exploration of timing and area trade-offs in optimizing arithmetic circuits with multiple operation trees using CSAs. The application of our approach leads to find a best CSA implementation of circuit in terms of both timing and area. Experimental resluts on a number of digital filter designs show that our algorithm is able to achieve 48% to 84% area saving under timing constraint and 4% to 39% timing reduction under area constraint compared with those produced by the conventional carry-save adder implementations.