At high-operating temperature, chip cooling is crucial due to the exponential temperature dependence of leakage current. However, traditional cooling methods, e. g., power/clock gating applied when a temperature threshold is reached, often cause excessive performance degradation. In this paper, we propose a method for delivering lower energy consumption by integrating the cooling and running in a temperature-aware manner without incurring performance penalty. In order to further reduce the energy consumption, we exploited the runtime distribution of each sub-segment of a task called "bin" in an analytical manner such that time budget for cooling in each bin is allocated in proportion to the probability of the occurrence of the bin. We apply the proposed method to two realistic software programs, H. 264 decoder and ray tracing and a benchmark program, equake. The experimental results show that the proposed method yields additional 19.4%-27.2% reduction in energy consumption compared with existing methods.