Clock gating has now become a standard design practice, and it is generally applied during RTL design stage. RTL clock gating has two significant limitations: the designer has to provide a gating funtion; and registers whose gating functions are not specified are left ungated. Gate-level clock gating, which is proposed to resolve these problems, automatically inserts clock gating structures into a given netlist. It consists of three steps: extracting a gating condition for each flip-flop; register grouping which classifies flip-flops into multiple groups so that the flip-flops in a same group are gated together; and adding ICG cells and the gates required to implement a gating condition for each flip-flop group. We propose a method of extracting gating conditions through detection of cyclic paths which increases the number of gated flip-flops by reducing the overhead of gating logic. We also suggest balanced register grouping to reduce the number of ICG cells and fast estimation of gating logic power. Implementation of gating conditions with least amount of additional gates is also discussed.