# Minimizing Skew Across Multiple Clock Trees in Gate Arrays

Prepared by: Thomas Lüdeke Motorola, Munich

#### **CONTENTS**

|                                       | Page |
|---------------------------------------|------|
| Overview                              | 1    |
| Multiplexing Clocks                   | 1    |
| Adding Initial Scan Clock Delays      | 1    |
| Starting the Entire Design Layout     | 2    |
| Fine Tuning the Scan Clock Delays     | 3    |
| Completing the Entire Design's Layout | 4    |
| Analyzing the Results                 |      |

#### INTRODUCTION

With gate array designs becoming larger and more complex, there is a growing need for internal scan testing. When performing a scan test, a single clock is most desirable; however, many array designs use several different clocks during normal operation. This presents the problem of balancing all internal clocks during scan mode in order to achieve the minimum overall skew.

#### **OBJECTIVE**

This paper presents a method for achieving the minimum overall clock skew across multiple clocks during scan mode.

#### 1. Overview

There are typically two modes of clock operation: normal and scan. During normal operation, all internal clocks are either directly driven by off-chip clocks or by internal clock-divider circuitry. During scan mode, all clocks must be driven by a single off-chip clock. A multiplexing function must take place to select the proper clock.

During normal operation, clock insertion delays are typically more critical than the skew between clocks. It is advised that any modification to the clock delays be done on the scan clock inputs of the multiplexers only, thus adding only a single multiplexer delay to the normal clock insertion delays.

The actual balancing of the clocks in scan mode is then done by tuning a delay at the scan clock inputs of the multiplexers in order to match all clock insertion delays and minimize the overall skew.

# 2. Multiplexing Clocks

The first step in the clock modification process in preparing for scan testing is the addition of a multiplexer in front of each clock tree root. The multiplexer selects the correct clock based on normal or scan mode operation. Figure 1. shows a typical clock multiplexer with a non-inverting clock tree. If the clock tree is inverting, the scan clock must also be inverted so that a non-inverted clock is used during scan testing. See Figure 2.

#### Note:

It is critical that the scan test mode select signal remains stable throughout the scan test and does not toggle between scan and broadside modes as does a scan enable signal.



Figure 1. Clock Multiplexing for Scan Testing with Non-Inverted Clock



Figure 2. Clock Multiplexing for Scan Testing with Inverted Clock

#### 3. Adding Initial Scan Clock Delays

In preparation for later fine tuning of the clock insertion delays during scan test mode, add an initial delay to the scan clock inputs of the clock multiplexers. It is easier to remove unwanted delay buffers later than it is to insert needed ones. A typical delay can be created by adding a string of buffers. See Figure 3.





Figure 3. Initial Scan Clock Delay

## 3.1. Choosing the Delay Buffers

The choice of buffer size for the delay buffers depends on the desired resolution for fine tuning and the available extra space on the array.

The resolution of the fine tuning is equal to the delay of a single stage in the delay line. Table 1. and Table 2. show the typical rise and fall delays for various buffers and technologies. The delays shown are representative of a single buffer in an eight-buffer chain and were measured using typical case timing derived from a PrediX placement in a group of 70% utilization.

Table 1. Buffer Rise Delay (ns)

| Delay<br>Buffer | HDC | H4C | H4EPlus<br>3.3V | H4EPlus<br>5.0V | M5C |  |
|-----------------|-----|-----|-----------------|-----------------|-----|--|
| BUF             | .50 | .26 | .36             | .28             | .23 |  |
| BUF2            | .50 | .30 | .38             | .30             | .26 |  |
| BUF3            | .59 | .40 | -               | -               | .28 |  |
| BUF4            | .53 | .39 | .42             | .32             | .35 |  |
| BUF6            | -   | -   | -               | -               | .44 |  |
| BUF8            | .51 | .37 | .37             | .31             | -   |  |
| BUF2B           | .51 | .34 | .37             | .29             | -   |  |
| BUF3B           | .61 | .40 | -               | -               | -   |  |
| BUF4B           | .63 | .45 | .47             | .35             | -   |  |
| BUF8B           | .56 | .40 | .47             | .35             | -   |  |

It is important to choose a buffer size based on the rise delay for non-inverting clocks. In the case of inverting clocks, the buffer size may be based on the fall delay if the scan clock is inverted after the delay, otherwise, use the rise delay.

The length of the delay line is determined by the maximum expected skew between clocks before balancing. It is better to overshoot this value since it is easier to correct by removing unwanted delay later.

In some cases, where the gate count of the design is close to the maximum utilization for an array size, the physical size of the delay buffers may also be important. See Table 3. for the number of gates used by various buffers.

Table 2. Buffer Fall Delay (ns)

| Delay<br>Buffer | HDC | H4C | H4EPlus<br>3.3V | H4EPlus<br>5.0V | M5C |
|-----------------|-----|-----|-----------------|-----------------|-----|
| BUF             | .53 | .30 | .37             | .27             | .24 |
| BUF2            | .54 | .42 | .45             | .31             | .25 |
| BUF3            | .70 | .60 | -               | -               | .39 |
| BUF4            | .73 | .63 | .58             | .40             | .39 |
| BUF6            | -   | -   | -               | -               | .50 |
| BUF8            | .64 | .59 | .51             | .35             | -   |
| BUF2B           | .53 | .38 | .40             | .28             | -   |
| BUF3B           | .50 | .50 | -               | -               | -   |
| BUF4B           | .64 | .54 | .51             | .38             | -   |
| BUF8B           | .76 | .48 | .49             | .38             | -   |

Table 3. Delay Buffer Gate Count

| Delay<br>Buffer | HDC | H4C | H4EPlus | M5C |
|-----------------|-----|-----|---------|-----|
| BUF             | 1   | 1   | 1       | 1.5 |
| BUF2            | 2   | 2   | 2       | 3.0 |
| BUF3            | 2   | 2   | -       | 3.0 |
| BUF4            | 3   | 3   | 3       | 4.5 |
| BUF6            | -   | -   | -       | 6.0 |
| BUF8            | 5   | 5   | 5       | -   |
| BUF2B           | 3   | 3   | 3       | -   |
| BUF3B           | 4   | 4   | -       | -   |
| BUF4B           | 5   | 5   | 5       | -   |
| BUF8B           | 9   | 10  | 10      | -   |

#### 3.2. Placing the Scan Clock Delay and Multiplexer

The delay, due to wiring within the delay line, must be minimized to increase fine tuning accuracy. This is accomplished by placing the buffers in the delay line as close together as possible. A good approach is to define a group with a region of about 70% utilization within PrediX for each delay line. The differences in wiring delays becomes negligible due to the short wire lengths.

Another recommendation for reducing the multiplexers' affect on normal clock insertion delays, is to place each clock multiplexer near the root buffer of its clock tree. The root buffers should be placed near the center of the destination areas.

## 4. Starting the Entire Design Layout

The design must be placed and the clock trees complete before fine tuning the scan clock delays. This requires a place-

AN1553

ment step and possible clock tree synthesis. Routing does not need to occur at this point but takes place after fine tuning.

## 4.1. Placing the Design

Placing of the design is the same as for other designs by using either PrediX IPS or Gate Ensemble (GE).

# 4.2. Synthesizing the Clock Trees

Clock tree synthesis (CTS) occurs in Gate Ensemble as in the normal flow. The CTS commands may be generated using PrediX, but the synthesis must be completed in GE. The clocks must be synthesized before further analysis can take place in order to measure the actual skews on each clock and between clocks.

## 5. Fine Tuning the Scan Clock Delays

The overall balancing of the multiple clocks is achieved by fine tuning the scan clock delays. In order to measure existing skews on each clock and between clocks, the post-placement timing is generated and a simulation of the circuit in scan mode is completed. Once existing skews are measured, the number of buffers that need to be removed from each delay line in order to minimize the overall skew can be determined. Next, remove unwanted buffers; after which, the clocks will have a minimum overall skew.

## 5.1. Generating Timing Information

After placement and clock tree synthesis, a Cadence Design Exchange Format (DEF) file must be saved. The DEF file can then be read into PrediX. In PrediX, a PreRoute is executed to generate the predicted parasitics. This is followed by writing out a Resistive Capacitance (RC) or Standard Delay Format (SDF) file, depending on the method of timing analysis.

If clock tree synthesis takes place, a new netlist must be generated for back annotation and simulation. The usual method is through DEF2GDS in MaX.

## 5.2. Measuring Clock Skews

The insertion delay and skew on each clock must be measured either by a static timing analyzer or through simulation. The simulation method assumes that the skew, at the leaf level, is negligible. This is because the largest part of the skew is generally observed in the first stages of the clock tree.

An easy way to measure the skews by simulation is to assign all leaf level nets for a given clock to a bus. The first transition of this bus is the minimum delay; the last transition is the maximum delay. The insertion reference is the average of the two (or the mean delay of the bus). See Figure 4.

#### Note:

It is critical that measurements are made in scan mode.



Figure 4. Example Clock Skews

### 5.3. Determining the Number of Buffers to Remove

With the minimum and maximum delay on each clock measured, it is a simple task to determine the number of scan clock delay buffers to remove. A good method is to use a spreadsheet using the following equations. See Figure 5.

For each clock, calculate the mean delay.

$$Mean = \frac{Max - Min}{2}$$
 (EQ 1)

 For each clock, calculate the difference between its mean delay and the mean delay of the slowest clock.

$$\mathsf{Diff} = \mathsf{max}\{\mathsf{Mean}(\mathsf{CLK}_1) \to \mathsf{Mean}(\mathsf{CLK}_n)\} - \mathsf{Mean} \tag{EQ 2}$$

 For each clock, calculate the equivalent number of buffer delays corresponding to the difference from (EQ 2).

$$NumDly = round(\frac{Diff}{Buffdelay})$$
 (EQ 3)

 For each clock, calculate the number of buffers to be removed based on the total number of buffers and the number of buffers from (EQ 3).

$$RemDly = Buffnum - NumDly$$
 (EQ 4)

 For each clock, calculate the new minimum and new maximum delay.

$$NewMin = Min - (RemDly \times Buffdelay)$$
 (EQ 5)

$$NewMax = Max - (RemDly \times Buffdelay)$$
 (EQ 6)

 Calculate the new overall minimum and maximum delay and new overall skew.

$$OverallMin = min\{NewMin(CLK_1) \rightarrow NewMin(CLK_n)\} \qquad (EQ 7)$$

$$OverallMax = max\{NewMax(CLK_1) \rightarrow NewMax(CLK_n)\}$$
 (EQ 8)

$${\tt OverallSkew = OverallMax-OverallMin} \qquad \qquad (EQ~9)$$

AN1553 3

| Clock<br>Tree | Min  | Max  | Mean | Diff | # DLYs | DLYs to<br>Remove | New<br>Min | New<br>Max |      |
|---------------|------|------|------|------|--------|-------------------|------------|------------|------|
| CLK_1         | 3.82 | 4.00 | 3.91 | 0.07 | 0      | 6                 | 1.42       | 1.60       |      |
| CLK_2         | 2.39 | 2.39 | 2.39 | 1.59 | 4      | 2                 | 1.59       | 1.59       |      |
| CLK_3         | 3.87 | 4.09 | 3.98 | 0.00 | 0      | 6                 | 1.47       | 1.69       |      |
| <b>1</b>      |      |      |      |      |        |                   |            |            |      |
| CLK_n         | 2.44 | 2.44 | 2.44 | 1.54 | 4      | 2                 | 1.64       | 1.64       |      |
|               |      |      |      |      |        |                   | 1.42       | 1.69       | 0.27 |

Figure 5. Example Spreadsheet

In Figure 5., the initial scan clock delays of eight BUF8Bs were used in H4C. The overall skew result across all the clocks is 0.27ns using typical case timing.

### 5.4 Removing the Required Buffers

The extra buffers which need to be removed are now determined. The buffers are removed by editing the post-clock-tree-synthesis netlist. If the clock tree synthesis has not taken place, edit the original source schematic or netlist.

#### Note:

It is very important that no errors are made during the editing process.

Delete the buffers so that the first and the last buffer of the initial delay line remain untouched. This ensures the accuracy of the estimated buffer delay from section 3. See Figure 6.





Figure 6. Example Buffer Removal

## 6. Completing the Entire Design's Layout

After the skews across all clocks are minimized in the netlist, the layout flow is continued from where it was left off. Since the placement and the clock tree synthesis has already taken place, this information only needs to be updated to the same status as the netlist. The routing can then be started.

First, generate a new DEF netlist from the edited netlist. This is usually done with EDIF2TANGATE in MaX. By generating a new DEF, the possibility for error is minimized.

A simple way to update the placement is to use the components section of the post-placement DEF from section 4. and read it into PrediX as a Fixed Cells file. This means that other sections, such as the nets sections, need to be deleted from the file. When this is now read into PrediX, the missing cells will automatically be dropped with only a warning message. A new placement file can now be written with no discrepancies.

If PrediX can not be used for some reason, it is possible to edit the post-placement DEF as described above and also delete the buffers which were removed from the netlist. This step must be carried out without error; otherwise, Gate Ensemble will crash.

The layout can now be continued in Gate Ensemble by reading in the appropriate placement file and routing the design.

#### Note:

The power and ground grids must be recreated since they were not kept in the DEF.

## 7. Analyzing the Results

After the layout is completed, the actual results of the clock skew minimization effort can be observed for each clock by rerunning the simulations described in section 5.2.

#### **SUMMARY**

This application note has described a method by which the overall skew across multiple clocks can be minimized. This is of particular interest during scan testing, but the same methods may be applied to any situation where the timing between different clocks is critical.

#### REFERENCES

- Joe Burkis and Jill Lipinski, Clock Tree Synthesis, Document Number 0422-CLOCKTREE-TN.1, Motorola, Inc., October 17, 1994.
- Joe Burkis and Jill Lipinski, Floorplanning, Document Number 0425-FLOORPLAN-TN.2, Motorola, Inc., October 17, 1994.
- Edward Evans et al., HDC Series Design Reference Guide, Document Number HDCDM/D Rev. 2, Motorola, Inc., August 1, 1991.
- H4EPlus ASIC Technology Group, H4EPlus™ Series Design Reference Guide, Document Number H4EPDM/D, Motorola, Inc., January, 1996.
- Loren Kinsey et al., M5C<sup>™</sup> Series Design Reference Guide, Document Number M5CDM/D Rev. 1, Motorola, Inc., October, 1994.
- Clarance Nakata and JoEllen Brock, H4C<sup>™</sup> Series
   Design Reference Guide, Document Number
   H4CDM/D Rev. 1, Motorola, Inc., June, 1993.
- OACS™ User Guide, Document Number 0430-OACS-UG.0, Motorola, Inc., October 17, 1994.

#### **TRADEMARKS**

HDC, H4C, H4EPlus, M5C, and PrediX are trademarks of Motorola, Inc.

Gate Ensemble is a trademark of Cadence Design Systems, Inc.

4 AN1553



# ASIC REGIONAL DESIGN CENTERS - U.S.A.

 California, San Jose
 Georgia, Atlanta
 Illinois, Chicago
 Massachusetts, Marlborough

 (408) 749-0510
 (404) 729-7100
 (847) 413-2500
 (508) 481-8100

# **ASIC REGIONAL DESIGN CENTERS – International**

**European Headquarters** 

Germany, Munich England, Aylesbury, Bucks France, Velizy Holland, Best (089) 92103-306 (0)1296) 395252 (01) 3463900 (04998) 61211

Hong Kong, Tai Po Israel, Tel Aviv Italy, Milan Japan, Tokyo (852)2666-8333 (09) 590-303 (02) 82201 (03) 440-3311

Sweden, Stockholm (08) 734-8800

Motorola reserves the right to make changes without further notice to any products herein. Motorola makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Motorola assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. "Typical" parameters can and do vary in different applications. All operating parameters, including "Typicals" must be validated for each customer application by customer's technical experts. Motorola does not convey any license under its patent rights nor the rights of others. Motorola products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Motorola product could create a situation where personal injury or death may occur. Should Buyer purchase or use Motorola products for any such unintended or unauthorized application, Buyer shall indemnify and hold Motorola and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Motorola was negligent regarding the design or manufacture of the part. Motorola and b are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer.

Specifications and information contained in this document are subject to change without notice.

How to reach us:

**USA/EUROPE:** Motorola Literature Distribution; P.O. Box 20912; Phoenix, Arizona 85036. 1-800-441-2447

MFAX: RMFAX0@email.sps.mot.com -TOUCHTONE (602) 244-6609

INTERNET: http://Design-NET.com

JAPAN: Nippon Motorola Ltd.; Tatsumi-SPD-JLDC, Toshikatsu Otsuki, 6F Seibu-Butsuryu-Center, 3-14-2 Tatsumi Koto-Ku, Tokyo 135, Japan. 03-3521-8315 HONG KONG: Motorola Semiconductors H.K. Ltd.; 8B Tai Ping Industrial Park,

51 Ting Kok Road, Tai Po, N.T., Hong Kong. 852-26629298

