# A 250MHz – 2GHz Wide Range Delay-Locked Loop Byung-Guk Kim and Lee-Sup Kim Department of EECS, KAIST, Daejeon, Korea #### **Abstract** This paper describes a wide range delay-locked loop (DLL) for a synchronous clocking to support a dynamic frequency and voltage scaling. The DLL achieves wide range by using multiple phases from its variable delay line. A phase detector is proposed to increase the locking speed and alleviate the phase offset owing to the inherent mismatch of the charge pump. The DLL achieves the static phase error under 10ps. At 1GHz, its RMS jitter and peak-to-peak jitter are 1.57ps and 10.7ps respectively. #### Introduction As the chip size and the clock frequency grow, well designed clock distribution network with the active de-skew method is required to minimize the performance degradation. Then, the DLL can be used to reduce the clock skew across local clock domains [1]. This paper employs the analog type DLL to reduce the clock skew of local clock networks over the large chip area because the digitally controlled DLL suffers from the unavoidable static phase error and the dithering problem. The DLL with wide operation range is especially required in mobile applications which support several operating frequencies and variable voltage. As the global clock frequency and supply voltage vary according to the operation modes of the system, local clocks over the chip are needed to be resynchronized with the global clock. Conventional DLLs with wide operating range are vulnerable to the noise in the voltage controlled delay line (VCDL) or on the control voltage (Vctrl) because their wide range results from the increase of the delay range of the VCDL for a limited range of Vctrl. As a result, this leads to large itter. To widen the operating range, a phase inversion scheme of the output clock was reported [2]. The phase inversion scheme expands the operation range of the DLL into lower frequency without increasing the delay range of the VCDL. We present the wide range DLL which increases its operating range into higher and lower frequency. In this paper, the use of the DLL in the synchronous clocking and the analysis for wide operating range are illustrated, and then the structure of the proposed DLL and the phase detector (PD) are described. We also discuss simulation and experimental results of the DLL. ### Architecture # A. Synchronous Clocking Fig. 1 describes methods for the clock distribution in the synchronous system with local clock domains. A global clock is generated by a PLL for the frequency scaling. Local clocks can be synchronized with the global clock as shown in Fig 1(a). Otherwise, in Fig 1.(b) the clock skew can be reduced by comparing one local clock with another local clock [3]. Fig. 1. Synchronous clocking schemes Fig. 2. DLL interface As the processor dynamically varies its frequency and supply voltage according to the operation mode, it waits for a sufficient time to stabilize the supply voltage and synchronize the clock signals over the chip. While the core logic has the scalable supply voltage, the PLL and the DLL have the supply voltage with the fixed level. Since the PLL and the DLL consist of analogue components susceptible to supply noise, bias level, and Vth(threshold voltage) variation, they have the fixed supply voltage to maintain the loop bandwidth and facilitate the design and analysis of their characteristics. They integrate level shifters to manipulate their fixed voltage and the scalable voltage of the core logic [4]. Fig. 2 describes the DLL interface with level shifters. Voltages of input clocks of the DLL are shifted to the fixed level through the level shifter. The level shifter behind the DLL generates the output clock with the scalable voltage used in the core logic. To provide the accurate synchronization between input clocks, placing additional devices on the path toward the PD is not recommended because of the phase offset problem owing to the device mismatch of additional devices. Therefore, input clocks which do not pass the level shifter are used in the PD to give the accurate information about their phase difference. In the environment supporting the dynamic frequency scaling, the DLL should have wide range that covers the variable frequency of the processor. # B. Wide Operating Range Fig. 3 shows an approach to operate in wide range. The MUX selects a phase for the output clock among four phases (Q, I, /Q, and /I). The *CLKin* (input of VCDL) and the *CLKref* (input of PD) can be the same clock in Fig. 1(a), otherwise in Fig. 1(b) they may have the same frequency but not the same phase. Fig. 4 illustrates the operating range of the DLL. While I and /I phases have a full delay range of the VCDL, delay ranges of Q and /Q phases are 3/4 times that of the VCDL. When the output clock is either Q or I phase, the delay from the CLKref to the CLKout is $n \times Tref$ (n: a positive integer, Tref: a period of the reference clock) in lock. Fig. 4(a) illustrates the operating range of Q and I phases defined by the line (y=Tref). In order to confirm the continuity of Q and I phases, the maximum delay of Q phase should be over the minimum delay of I phase. When either /Q or /I phase is selected for the output clock, the DLL tries to lock into the state that the delay difference between the CLKref and the CLKout is $(n-1)\times Tref + (1/2)\times Tref$ . In Fig. 4(b), the DLL operates in lower frequency range because lock range of /Q and /I phases depends on the line (y=Tref/2). Fig. 4(c) shows a conceptual graph to explain the continuity of delay ranges of Q, I, /Q, and /I phases. As the line (y=Tref/2) in Fig. 4(b) is converted into the line (y=Tref) in Fig. 4(c), delay ranges of /Q and /I phases are shifted by Tref/2. When the DLL tries to lock through several phase conversions with four phases (Q, 1, /Q, /I), delay ranges of four phases should hold the continuity under the locking condition (y=Tref in Fig. 4(c)). (c)Operation range of Q, I, /Q, /I phases (d)Operation range at harmonic lock Fig. 4. Operating range of the DLL *Ddist* in Fig. 4 means the delay of the clock distribution network. In Fig. 1(a), since the *CLKref* and the *CLKin* have the same phase, *Ddist* in such a case can be expressed as follows. $$Ddist = Ddist \ 1 \tag{1}$$ where *Ddist\_1* is the delay of the local clock distribution #1. In Fig. 1(b), however, the *CLKref* is the delayed phase of the *CLKin* by another local clock network. Then *Ddist* is $$Ddist = Ddist \ 1 - Ddist \ 2 - DVCDL \ 2 \tag{2}$$ where *Ddist\_2* is the delay of the local clock distribution #2 and *DVCDL 2* is the delay of the VCDL #2. We should consider the continuity of delay ranges of Q, I, /Q, and /I phases in Fig. 4(c). The condition for the continuity between Q and I phases in Fig. 4(a) and that between /Q and /I phases in Fig. 4(b) is as follows. $$DVCDL min + Ddist < 3/4 \times DVCDL max + Ddist$$ (3) Thus, 4/3×DVCDL\_min < DVCDL\_max (4) For the continuity between I and /Q phases in Fig. 4(c), under the locking condition delay ranges of I and /Q are overlapped. • maximum delay of I : y = DVCDL max + Ddist • minimum delay of /Q: $y = 3/4 \times DVCDL\_min + Tref/2 + Ddist$ If y = Tref, $3/2 \times DVCDL\_min + 2 \times Ddist < DVCDL\_max + Ddist$ (5) Thus, $$3/2 \times DVCDL\_min + Ddist < DVCDL\_max$$ (6) Then, the condition to achieve both (4) and (6) is $$1.5 \times DVCDL \ min + Ddist < DVCDL \ max$$ (7) The delay range of the VCDL should be adjusted according to the value of *Ddist*. In the synchronous clocking scheme such as Fig. 1(a), *Ddist* has relatively large value because it is the delay of the local clock network. Then, increasing the minimum delay boundary of the VCDL can reduce the delay range of the VCDL which is required to realize (7). However, increasing too much the DVCDL min may disturb the propagation of high frequency clock. When the delay amount of the local clock network is too large, the synchronization scheme such as Fig. 1(b) can be properly employed to alleviate a burden of the delay range of the VCDL. The VCDL can have the narrow delay range to keep the continuity of Q, I, /Q, and /I because Ddist can be quite reduced as illustrated in (2). To set the delay range of the VCDL of each DLL in the synchronization scheme such as Fig. 1(b), delays of all local clock networks must be taken into account at the worst case and the best case. *Ddist* is then needed to have the value more than zero. Otherwise the DLL has a difficulty in locking at the target frequency range, especially at the lower frequency range. As a result, this requires the increase of the delay range of the VCDL although (7) is realized. Since the clock signal goes through large delay of the clock distribution network, the delay from the *CLKref* to the *CLKout* in lock can be *n×Tref* such as *Tref*, *2×Tref*, and *3×Tref*. Fig. 4(d) illustrates operation ranges at the harmonic lock. When y = Tref, the delay range of Q, I, /Q, and /I phases is $$3/4 \times DVCDL_min < y < DVCDL_max + Tref/2$$ (8) Then, (8) can be expressed as follows. $$y = Tref$$ : $3/4 \times DVCDL_min \sim 2 \times DVCDL_max$ (9) When $y = 2 \times Tref$ , the delay range of four phases is $y = 2 \times Tref$ : $3/8 \times DVCDL min \sim 2/3 \times DVCDL max$ (10) When $y = 3 \times Tref$ , the delay range of four phases is $y = 3 \times Tref$ : $3/12 \times DVCDL min \sim 2/5 \times DVCDL max (11)$ The harmonic lock then brings the expansion of operating range into higher frequency. From (9), (10), and (11), the delay range at $y = n \times Tref$ (n: a positive integer) is $(1/n) \times (3/4) \times DVCDL_{min} \sim (2/(2n-1)) \times DVCDL_{max}$ (12) The operation range at $y = (n+1) \times Tref$ is $(1/(n+1)) \times (3/4) \times DVCDL_{min} \sim (2/(2n+1)) \times DVCDL_{max}$ (13) For the continuity between two cases, the maximum delay boundary at $y = (n+1) \times Tref$ and the minimum delay boundary at $y = n \times Tref$ should be overlapped as follows. $(1/n)\times(3/4)\times DVCDL\_min < (2/(2n+1))\times DVCDL\_max$ (14) $\rightarrow$ (3/4 + (3/8)×(1/n))×DVCDL\_min < DVCDL\_max (15) Thus, the condition for the continuity at the harmonic lock is $9/8 \times DVCDL\_min < DVCDL\_max$ , n > 0(n: a positive integer) (16) For the continuous operation range at the harmonic lock, the maximum delay of the VCDL should be more than 9/8 times its minimum delay. Since the condition of (7) for the continuity among four phases at y = Tref covers the condition of (16) for the harmonic lock, the DLL has the continuity in all modes of the harmonic lock. Then, the DLL can theoretically operate to an unlimited extent of high frequency range because its operation range is shifted into higher frequency as the mode of the harmonic lock increases as shown in Fig. 4(d). Finally, the DLL using four phases has wider operating range than one using a single phase. The DLL can also achieve low jitter because the narrow delay range of the VCDL makes it less sensitive to the noise in the VCDL or on the Vctrl. #### C. DLL Architecture Fig. 5(a) shows the architecture of the proposed DLL. To simplify the description of the architecture, the DLL interfaces such as level shifters are omitted. Two comparators always monitor the value of the Vctrl. When the DLL reaches at the minimum boundary or the maximum boundary, reset signals from comparators reset the CLKref or the CLKout, and then change the phase of the output clock. The locking process of the DLL is illustrated in Fig. 6. When the Vctrl goes to 'GND', the DLL reaches at the minimum delay boundary and fails to lock. The DLL then reset the CLKref during the time the Vctrl is below Vtn and replaces the current phase of the output clock with the next late phase. When the Vctrl goes to 'VDD', the DLL reaches at the maximum delay boundary. Then, the phase of the CLKout is reset and the current phase of the output clock is replaced by its inversion phase. The DLL eventually locks as it undergoes several phase conversions. The DLL can attain the correct lock from the random phase of the output clock thanks to the circulating feature of the phase conversion in Fig. 5(b). Fig. 5. (a) Architecture of the proposed DLL.(b) Circulation of phase conversion (a) At the minimum delay boundary boundary (b) At the maximum delay boundary Fig. 6. Locking process ### D. Phase Detector In Fig. 7, the proposed PD has NOR gates which can be reset into 'HIGH'. It has an increasing gain characteristic during the range of input phase difference $(-2\pi \sim 2\pi)$ to perform the correct locking process through phase conversions. However, the PD may make the DLL have longer locking time than the DLL with the PD which has the phase capture range $(-\pi \sim \pi)$ . To increase the locking speed with the phase capture range $(-2\pi \sim 2\pi)$ , the proposed PD has a linear gain in the range $(-\pi \sim \pi)$ and a large constant gain in the range $(-2\pi \sim -\pi \& \pi \sim 2\pi)$ . Its timing diagram and characteristic is described in Fig. 8. In the range $(-\pi)$ $\sim \pi$ ), UP and DN pulses can be reset into 'LOW'. In the range (- $2\pi \sim -\pi \& \pi \sim 2\pi$ ), however, the pulse for only the late clock can be reset into 'LOW' while the pulse for the early clock keeps the level 'HIGH'. The PD has the linear gain characteristic in the range $(-\pi \sim \pi)$ and acts as the binary type PD in the range $(-2\pi \sim$ $-\pi \& \pi \sim 2\pi$ ). Since the difference of input phases $(-2\pi \sim -\pi \& \pi$ $\sim 2\pi$ ) is far from the locking point, the binary gain characteristic of PD improves the locking speed of the DLL at the unlocked state. The locking point of the DLL must be in the range of input phase difference $(-\pi \sim \pi)$ . Since the PD has the linear gain characteristic in the range $(-\pi \sim \pi)$ , the DLL takes a low jitter benefit of the linear type PD. Fig. 7. Phase detector Fig. 8. Timing diagram and characteristic of the PD To delicately synchronize local clocks, input clocks which do not pass the level shifter before the PD should be compared on the PD. The PD can detect rising edges of input clocks with the scalable voltage. Circled regions in Fig. 7 are critical regions to detect rising edges of input clocks. Rising edges of input clocks with the scalable VDD are detected by the NMOS. Parts except for circled regions receive input signals with the fixed VDD because they do not provide the information about the arrival of the rising edge of the clock directly. Thus, the PD detects phase difference between input clocks up to a quite accurate extent. Fig. 9. PD with pulse reshaper and its timing diagram As shown in Fig. 9, a little difference between pulse widths of UP and DN can be ignored by the current mismatch of the charge pump (CP). The DLL then locks with the static phase offset. The pulse reshaper makes UP and DN pulses have the variable width and height. Only one pulse (UP or DN) is 'HIGH' for the unlocked state (|CLKref - CLKout| > Tm) while both UP and DN pulses are 'HIGH' at the locked state (|CLKref - CLKout| < Tm). As the CLKref and the CLKout become closer in the phase, the pulse (UP or DN) activated by the late clock has an increasing voltage value like a glitch. The pulse with the variable voltage can be made by adjusting the slew rate of the inverter. The PD with the pulse reshaper has a little higher gain around the locking point. This helps to compensate the static phase offset owing to the inherent mismatch of CP. # Results Fig. 10 shows simulation results of the static phase error in the operating range from 250MHz to 2GHz. Two DLLs used in the simulations have the same structure except for the pulse reshaper. While the static phase error in the DLL without the pulse reshaper is over 50ps in most operating range, the DLL with the pulse reshaper has the static phase error under 10ps in most operating range. Thus, the DLL with the pulse reshaper has little amount of the static phase error despite the existence of the Fig. 10. Static phase error 142 Fig. 11. Die micrograph The proposed DLL chip is fabricated in a 0.18um CMOS process. Fig. 11 shows a die micrograph of a test chip. The active area of the DLL is 230um×200um. The DLL core dissipates 6.4mW at 2GHz and 1.2mW at 250mW. The measured jitter histograms are shown in Fig. 12. These prove that the DLL achieves low jitter in wide operating range. The input clock of 1GHz from the pattern generator used in the measurement has 0.98ps RMS jitter and 7.1ps peak-to-peak jitter. Then, the output clock of the DLL has 1.57ps RMS jitter and 10.7ps peak-to-peak jitter. This low jitter results from the narrow delay range of the VCDL and the on-chip supply filtering which suppresses the supply noise. Table 1 summarizes DLL specifications. TABLE 1: SPECIFICATIONS OF DLL TEST CHIP | Technology | 0.18um CMOS | |---------------------------|------------------------------| | Supply Voltage | 1.8V | | Operating Frequency Range | 250MHz – 2GHz | | Active Area | 230um×200um | | Power Dissipation | 6.4mW (2GHz), 1.2mW (250MHz) | | Jitter @250GHz | 5.25ps (RMS), 31.6ps (p-p) | | @ 500MHz | 3.09ps (RMS), 18.7ps (p-p) | | @ 1GHz | 1.57ps (RMS), 10.7ps (p-p) | | @ 1.5GHz | 2.01ps (RMS), 14.7ps (p-p) | | @ 2GHz | 2.81ps (RMS), 20.4ps (p-p) | | Static Phase error | < 10ps (250MHz – 2GHz) | ## Conclusions We have developed a wide range DLL which expands its operating range into higher and lower frequency. A novel characteristic of a PD make the DLL achieve fast locking speed. A pulse reshaper reduces the static phase error under 10ps in the frequency range from 250MHz to 2GHz. Finally, the wide range DLL with low jitter and low static phase error can contribute to the performance improvement of processors supporting the dynamic frequency and voltage scaling. ## Acknowledgments This work was supported by KOSEF through the MICROS at KAIST and IT-SOC Promotion Group through Ministry of Information and Communication, Korea. #### References - [1] T. Xanthopoulos, D. W. Bailey, A. K. Gangwar, M. K. Gowan, A. K. Jain, and B. K. Prewitt, "The design and analysis of the clock distribution network for a 1.2GHz alpha microprocessor," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2001, pp. 402-403. - [2] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M. Kumanoya, and H. Hamano, "A delay-locked loop and 90-degree phase shifter for 800Mbps double data rate memories," in *Symp. VLSI Circuits Dig. Tech. Papers*, June. 1998, pp. 66-67. - [3] V. Gutnik and A. P. Chandrakasan, "Active GHz clock network using distributed PLLs," *IEEE J. Solid-State Circuits*, vol. 35, pp. 1553-1560, Nov. 2000 - [4] K. J. Nowka, G. D. Carpenter, E. W. MacDonald, H. C. Ngo, B. C. Brock, K. I. Ishii, T. Y. Nguyen, and J. L. Burns, "A 32-bit PowerPC system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling," *IEEE J. Solid-State Circuits*, vol. 37, pp. 1441-1447, Nov. 2002.