# PATHWAVE

# Designing Leading-Edge Memory Systems

A cohesive workflow from design-to-test

## Introduction: Evolving Standards

Driven by the high demands of increasing data throughput via video-streaming, 5G, artificial intelligence, virtual reality, and internet of things, next-generation memory technologies have become crucial to keep up. This brings complex design challenges such as: increased crosstalk, accurate PCB model extractions, and jitter tracking for optimum DFE settings. The need to simulate, analyze, debug, in a timely and efficient manner is more important than ever.

With next generation memory including DDR4 and beyond, addressing design challenges including crosstalk, jitter, and JEDEC compliance are key. What signal integrity engineers currently use involves multiple design and test tools to address these challenges, however the disconnect from each tool or workflow results in a slow, inefficient workflow.

Keysight's leading-edge PathWave ADS Memory Designer platform enables you to address these challenges while designing next-generation memory interfaces.



## **Table of Contents**

| Introduction: Evolving Standards                              | 1  |
|---------------------------------------------------------------|----|
| A Cohesive Memory Design Workflow                             | 3  |
| Next Generation Memory Interface Design with Equalization     | 5  |
| Solutions to Equalization – Single-Ended AMI Models           | 5  |
| New Innovative Forwarded Clocking solution                    | 7  |
| Phase Interpolator in Forwarded Clocking                      | 8  |
| Jitter Tracking with Forwarded Clocking                       | 8  |
| Pre and Post Layout Flow – RapidScan Z₀ and PCB EM Extraction | 9  |
| Design Exploration and Compliance                             | 10 |
| Probing and Performing Measurements                           | 13 |
| Measurement and Verification                                  | 16 |
| Conclusion                                                    | 18 |

## A Cohesive Memory Design Workflow

Designing modern memory interfaces can be a challenging and complex task. The workflow shown later in Figure 1 highlights a typical memory design workflow, however there can be additional steps beyond this workflow. It may include additional simulation iterations, setting up and performing measurements, as well as testing the board once manufactured.

The main Memory Designer workflow is targeted for simulation, analysis, and compliance tests. Additionally, there are specific tools that focus on building IBIS/AMI modeling for drivers and receivers. Finally, there are specific electromagnetic simulation features for pre- and post-layout memory channel characterization.

The Memory Designer Workflow begins with the technology selection which can include, but not exclusive to the following: DDR4, LPDDR4, DDR5, LPDDR5, GDDR6, GDDR6X, GDDR7, HBM2E, HBM3, etc.

Once the technology has been chosen, the memory channel data can be brought in. There are a variety of methods to bring PCB data to analyze, for example, pre-layout or post layout designs using EM extraction tools, or bringing in simulated or measured s-parameter data. This is all within the channel modeling workflow. Once you have the channel data (which could be data bus or command address control buses), the next step is to setup the memory controller and DRAM. If IBIS models are available, you can easily associate them with the controllers and memories. With the IBIS models, you can choose a different model selector, corner cases, ODT (on-die termination) values, and any other model parameters. If you do not have IBIS models, Memory Designer still supports a non-IBIS flow by creating generic IBIS models automatically for you.

The controllers, memory devices and PCB structures are schematic components, which you can easily wire up your connections utilizing the smart bus wires. The smart bus wires automatically make the connection between components by recognizing the net / node information. The next step is to select the desired measurements such as: strobed-eye, eye height and width, BER (bit-error rate) contour, or margins by using the smart memory probe component. The available signals and measurements are automatically listed, for easy measurement selection and setup.

A few different simulation technologies support memory interface simulations, including transient convolution, and DDR bus simulation with bit-by-bit and statistical modes.

The Memory Designer workflow ends with data analysis and compliance testing.

PathWave ADS Memory Designer's cohesive workflow offers a comprehensive memory interface design workflow.



#### Seco (A. Pali):

"Before using ADS we used to follow the layout design-guide given by the silicon vendor, and it often wasn't possible to follow their rules; for these reasons our design in some cases failed. We have been using ADS for years in our design-flow now, and we are able to define our design rules that are fit for our custom applications. For us now, it is inconceivable to start designing a complex RAM architecture without first simulating with ADS; now we can understand better our limits and optimize our design in terms of quality and money."



Figure 1: Typical memory interface workflow

The example shown below in Figure 2 is an example of a complex memory system, consisting of a motherboard with a two slot SO-DIMM DDR channel. Using the Memory Designer workflow, the whole memory system can be easily constructed and analyzed.



Figure 2: Memory Designer schematic of a two-slot edge computing board (UART), by Seco with extracted layout

## Next Generation Memory Interface Design with Equalization

DDR memory systems can be complex to design due to the varied termination choices, and their effect on ISI (inter-symbol interference) in the channel. Simulations can help designers to predict their true design margins through virtual prototyping, including how fast the memory interface can run within the mask margin and finding where the problems are. The virtual prototyping becomes even more critical with the next generation memory systems, DDR5, LPDDR5, GDDR6x, GDDR7, and others. This is where you may not be able to open the eye without employing channel equalizations.

If you run the same DDR4 physical channel at a higher speed grade for DDR5 (6 -9 Gbps), you may see a completely closed eye as shown in Figure 3:



Figure 3: 3.2 GHz DDR4 Channel at 2.4 Gbps (Left) and 6.4 Gbps (Right)

## Solutions to Equalization – Single-Ended AMI Models

IBIS models represent analog electrical behaviors for transmitters and receivers. Many advanced Serializer – Deserializer (Serdes) chips employ equalizations such as CTLE, FFE, DFE, AGC, along with CDR to compensate the channel loss, ISI, and cross talk. How does IBIS models handle this?

AMI (Algorithmic Modeling Interface) is the modeling interface for SerDes behavioral models which simulate SerDes functionalities such as equalization and CDR. One example of a time domain simulation workflow is demonstrated in Figure 4:

The AMI flow was added alongside the traditional (SPICE-based) IBIS flow in IBIS version 5.0. The AMI portion is specified in a section of the IBIS file known as the (Algorithmic Model) keyword. The AMI portion acts as a DSP block which takes an input signal waveform and /or impulse response and outputs a modified waveform and/or impulse response. AMI models are developed by SERDES vendors to match and represent the actual chip behavior. Vendors deliver models in the form of DLL or/and shared object to protect their IP plus the .ami and .ibs text plain files, so that it also provides interoperability between EDA vendors.



Figure 4: IBIS-AMI time-domain simulation flow

As far as applications for IBIS models are concerned, some of the most complex IBIS models have been created for memory interfaces (DDR). This is due to the large number of signal pins, packages and configurations available (especially thinking about multiple DRAM dice stacked inside a single package of LPDDR4). Up until DDR4/LPDDR4, IBIS models have covered all the simulation needs of the typical SI engineer.

As we move forward to next-generation memories (DDR5/LPDDR5), the technology on chip has evolved, and so must the modeling and simulation technology. In DDR5 and LPDDR5, equalization is available on the commodity DRAM and controller devices for the first time, which came with Variable Gain, CTLE (Continuous Time Linear Equalization), and DFE (Decision Feedback Equalization).

The speed in DDR5 and LPDDR5 systems is increased to up to 6400 MT/s, resulting in worsened ISI impairment. Equalization techniques including deemphasis, CTLE and DFE are utilized in memory controller and DRAM to mitigate ISI. Fast speeds also lead to shrinking voltage and timing margins, which are specified at extremely low BER levels. As a result, jitter and noise become critical factors that impact system performance.

To produce reliable margin predictions of DDR5 and LPDDR5 systems, we need to account for effects of ISI, equalization, jitter and noise, and millions of bits need to be processed to yield accurate results at specified low BER levels. AMI is the best technical solution for DDR5/LPDDR5 simulation due to its versatility / flexibility in I/O behavioral modeling and its superior simulation speed. However, the unique architecture of DDR channels presents new challenges to AMI when applied to DDR5 and LPDDR5 systems. Recent developments in the AMI methodology have addressed these issues, including single-ended signals in DDR channels, asymmetric rise and fall edges in single-ended signals, and clock forwarding.

Specifically, AMI assumes that all channels are differential. In a DDR channel, DQ and CAC (address/command/control) signals are single-ended and have both common and differential components. To resolve this issue, the single-ended input signal to the Rx model is decomposed into common and differential components. The differential component remains the input waveform to the Rx AMI\_GetWave function, same as in the current specification. The common component, which is assumed to be a constant, is characterized by the simulation tool as the mean value of the steady state high and low voltages at the Rx pad. The value is passed to the Rx model by the EDA tool in the AMI\_Init call through a new DC\_Offset parameter. In the AMI\_GetWave function the Rx model can choose to internally recover the single-ended input signal by adding DC\_Offset to the differential input waveform.



Figure 5: Single-ended IBIS-AMI flow for DDR5/LPDDR5

## New Innovative Forwarded Clocking solution

In the AMI specification, it is assumed that every Rx has its own CDR circuitry to recover the clock from the data, and the AMI\_GetWave function has only one input waveform, which is the data signal. However, DDR channels employ the so-called clock forwarding architecture, where, instead of using an internal CDR, the DQ Rx uses a DQS signal as the forwarded clock to clock the DQ Rx DFE slicer and data sampling. Practically, the DQ Rx device has two input signals, one is data, and the other is clock. To enable modeling of clock forwarding, a new Rx AMI\_GetWave API, originally known as GetWave2, is established in IBIS BIRD 204 and approved for a future release of IBIS specification. The API defines two input waveforms for data and clock signals respectively. The DQ Rx clocking behavior can be physically modeled in the new AMI\_GetWave function.

#### Phase Interpolator in Forwarded Clocking

Besides clock forwarding, another key clocking functionality can be modeled using the new AMI\_GetWave API is the phase interpolator in the controller DQ Rx. During READ cycles the controller DQ Rx API applies a 90° phase shift to the forwarded DQS signal and mixes it with the original one. The resulting signal is a delayed DQS signal, and the delay value depends on the mixing weights. During system training the controller tunes the weights and therefore the delay to adjust the DQ-DQS skew for optimal DQ Rx DFE clocking in READ operations. Figure 6 shows a READ cycle controller DQ post-DFE eye with and without PI (phase interpolator) training modeled by the new AMI\_GetWave API. The objective of the training aligns DFE switching with the data bit edges to help open the eye.



Figure 6: Before and after phase interpolator training with forwarded clocking model

#### Jitter Tracking with Forwarded Clocking

One advantage of the clock forwarding architecture is jitter tracking. Since the DQS signal is used to clock the DQ Rx, at the instant the DQ is sampled, correlated jitter between DQ and DQS is cancelled. On the other hand, the DDR5 specification allows a certain amount of electrical path mismatch between DQ and DQS Rx. The mismatch reduces the DQ-DQS jitter correlation and adversely impacts the effectiveness of jitter tracking and DFE. With the new AMI\_GetWave API, both jitter tracking and the effect of unmatched Rx can be captured naturally in AMI simulations. Figure 7 shows simulated eyes of a DQ signal at the Rx package pin and at the Rx DFE output. With no jitter injected at the Tx, the eye is almost closed by ISI at the package but opened by the DFE at the Rx output. When SJ (sinusoidal jitter) is injected at the DQ and DQS transmitters, the eye is completely closed at the package. In the case of matched Rx (with zero DQS-to-DQ delay) DQ and DQS jitter is correlated and tracked by DQ sampling times, leaving the DQ post-DFE eye almost unchanged from that without Tx SJ. In the case of unmatched Rx (with a 5UI DQS-to-DQ delay) the DQ-DQS jitter correlation is reduced, and the jitter tracking becomes less effective, leading to a worsened eye post DFE.



## Pre and Post Layout Flow – RapidScan Z<sub>0</sub> and PCB EM Extraction

If you need to quickly analyze your channels against target impedance and delay, PathWave ADS RapidScan  $Z_0$  offers a solution to it. It processes any layout quickly and analyzes the impedance and time delay in graphical or numerical table format, then compares to a given specification. This information can be used to optimize the layout before fabrication.



Figure 8: Memory interface layout with RapidScan Z<sub>0</sub> results

### **Design Exploration and Compliance**

Due to the complexity of memory systems, one common practice in the design community is to optimize the system performance for various design parameters.

Memory Designer's Design Exploration solution allows you to investigate variations easily with built-in parametric analysis and batch simulation capabilities, for instance, determining the highest possible bus speed, across variations in PVT (Process Voltage Temperature) and/or ODT (On Die Termination) schemes. Insert your own specification so that the pass/fail report can be generated per sweep or batch (sample results shown in Figure 9). Design Exploration offers an extensive design space exploration to find optimized system performance for cost, performance, high yield. Additionally, this tool reports you design margin and protects from later errors, such as manufacturing variations and silicon changes.

| Sweep |                       |                         |                         |                          |              |                    |                |
|-------|-----------------------|-------------------------|-------------------------|--------------------------|--------------|--------------------|----------------|
| Index | Sweep Setting         | Signal                  | Measurement             | Value                    | Deviation(%) | Range              | Pass or Fai    |
|       | Controller_DQ_Signal_ |                         |                         |                          |              |                    |                |
|       | 1 Corner_Cases=1.0    | DQ_Ch0_U8_RxOutput_DQ43 | Eye Height              | 0.373 V                  | 9.70588      | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.31250e-10 s (0.85 UI)  | 1.19048      | 0.84 UI to inf UI  | Pass           |
|       |                       |                         |                         |                          |              |                    |                |
|       |                       | DQ_Ch0_U8_RxOutput_DQ47 | Eye Height              | 0.364 V                  | 7.05882      | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.84375e-10 s (0.935 UI) | 11.3095      | 0.84 UI to inf UI  | Pass           |
|       |                       |                         |                         |                          |              |                    |                |
|       |                       | DQ_Ch0_U8_RxOutput_DQ46 | Eye Height              | 0.409 V                  | 20.2941      | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.68750e-10 s (0.91 UI)  | 8.33333      | 0.84 UI to inf UI  | Pass           |
|       |                       |                         |                         |                          |              |                    |                |
|       |                       | DQ_Ch0_U8_RxOutput_DQ42 | Eye Height              | 0.404 V                  |              | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.25000e-10 s (0.84 UI)  | -1.32E-14    | 0.84 UI to inf UI  | Fail           |
|       |                       |                         |                         |                          |              |                    | _              |
|       |                       | DQ_Ch0_U8_RxOutput_DQ40 | Eye Height              | 0.398 V                  |              | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.40625e-10 s (0.865 UI) | 2.97619      | 0.84 UI to inf UI  | Pass           |
|       |                       |                         |                         | 0.0451/                  | 4 75474      | 0.04144 . 614      |                |
|       |                       | DQ_Ch0_U8_RxOutput_DQ45 | Eye Height              | 0.346 V                  |              | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eye Width               | 5.78125e-10 s (0.925 UI) | 10.119       | 0.84 UI to inf UI  | Pass           |
|       |                       | DO Cho He Buontinut DOM | Fire Heisels            | 0.351 V                  | 2 22520      | 0.34 V to inf V    | Dane           |
|       |                       | DQ_Ch0_U8_RxOutput_DQ41 | Eye Height<br>Eye Width | 5.40625e-10 s (0.865 UI) |              | 0.84 UI to inf UI  | Pass<br>Pass   |
|       |                       |                         | Eye width               | 3.400236-103 (0.803 01)  | 2.57015      | 0.64 01 10 1111 01 | PdSS           |
|       |                       | DQ Ch0 U8 RxOutput DQ44 | Eye Height              | 0.409 V                  | 20 2041      | 0.34 V to inf V    | Pass           |
|       |                       | DQ_cno_os_nxoutput_DQ44 | Eye Width               | 5.78125e-10 s (0.925 UI) |              | 0.84 UI to inf UI  | Pass           |
|       |                       |                         | Lyc Width               | 5.761236-103 (0.323 01)  | 10.113       | 0.04 01 10 1111 01 | 1 433          |
| Sweep |                       |                         |                         |                          |              |                    |                |
| Index | Sweep Setting         | Signal                  | Measurement             | Value                    | Margin(%)    | Range              | Pass or Fail   |
|       | Controller_DQ_Signal_ | _                       |                         |                          |              |                    | 7 222 31 1 311 |
|       | 2 Corner Cases=4.0    | DQ Ch0 U8 RxOutput DQ43 | Eye Height              | 0.389 V                  | 14.4118      | 0.34 V to inf V    | Pass           |
|       |                       |                         | Eve Width               | 5.31250e-10 s (0.85 UI)  | 1.19048      | 0.84 UI to inf UI  | Pass           |

Figure 9: Sample data from Design Exploration

Utilize the generated waveforms to predict final compliance sign off: the simulation results can be transferred to Keysight's Infiniium compliance test application software, which generates a finalized report, measuring to crucial JEDEC standards. The design-to-test process is completely integrated and automated within Memory Designer, and the stored data can be compared against the measured results from physical prototypes.



All simulation settings that have been specified in the design phase will be handed over to the oscilloscope-based compliance application. The standalone software evaluates the specification conformity. The test information such as speed grade, required voltage levels and timing related information is automatically transferred and can be saved as a pre-requisite template file for later use with real physical hardware measurements on an oscilloscope. It eliminates the error prone manual setup of the measurement conditions at different locations. Utilizing the same measurement science for both simulated waveforms, and physically measured waveforms has the benefit of reducing uncertainty when correlating simulations to measurement.

## **Probing and Performing Measurements**

Another challenge in measuring signal integrity on complex computer platforms is probing. Large LGA-based (land grid array) SoC's (system on chip) for controllers, and double-sided assembled SO-DIMMs (Dual Inline Memory Module) usually do not allow for direct probing at the vias, due to positions of mounted decoupling capacitors.

If we can't probe directly, how then do we get access to the signals we want? An additional challenge is that if you probe on a location different from the desired spot, closest to the receiver, the signal may look very different, as seen in the figure below. A hardware measurement-based validation without deembedding or compensation will yield incorrect results.

If we can't probe directly, how then do we get access to the signals we want? How does this fit to physical measurements? Firstly, de-embedding becomes important, as probing changes the physical layer, introduces parasitic effects and reduced the bandwidth of the used oscilloscope.

A DSO series 33 GHz mixed signal oscilloscope is used for the measurement in combination with the ZIF probe head N5425A, reducing the bandwidth to 12 GHz. This probe head was used to connect to the ZIF tips soldered down on N2114A DDR4 BGA Interposer as shown in the following pictures.



Figure 10: ZIF tips soldered onto the interposer, on the SO-DIMM



Figure 11: Without ZIF tips connected

As shown in Figure 11, a riser was used to overcome size limitations on the board. In the process the DRAM was soldered off and placed again after soldering down the riser and the interposer.

From the interposer and riser layout, an s-parameter description was extracted, and the related model can be used in the measurement Infiniium oscilloscope software to compensate for the S21 transmission change in signal integrity by the probing hardware.



Figure 12: Infiniium Oscilloscope de-embedding software menu

Pictured below is a closer look into the different tooling, as for LPDDR5 new probing techniques need to be applied:



Figure 13: DDR5 BGA interposer logic analyzer installed directly onto PCB

The specific interposer pictured, can be connected for Logic Analysis. This is still an important debugging and validation tool today, particularly when the system contains your own ASIC/ SoC as the memory controller and expecting to be interoperable with a variety of different DRAM providers.



Figure 14: Functional compliance validation against JEDEC specification with violation detection (right) across speed changes and CK termination changes

To ensure interoperability and identifying root causes of system failure fast, the protocol compliance application allows you to debug with a complete view of the DDR traffic using powerful analysis capabilities and scope visualization per signal lane. The user sees the flow of the traffic between the memory controller and memory device to get to the root cause of system issues, you can correlate the memory traffic to scope captures of signal integrity for specific signals.

Whole test systems can be realized in test hardware, even for a DIMM (Dual in line memory module):



Figure 15: Calibration connection diagram with RDIMM

There is a broad offering by Keysight for oscilloscopes and different BERT configurations to build an end-to-end test system.

#### Measurement and Verification

Control your expectations: An old engineering wisdom states, ask yourself what you expect to find, independent if your press the "simulate"- or the "measure"-button. A 100% agreement between measured and simulated data is a wish, which one could realize if you put tremendous efforts into all boundary conditions you need to freeze for an apple-to-apple comparison. This list also explains in our case shown here, why deviations are usual, and on which boundary conditions we did not have influence.

- 1. Controller operating firmware, determining the speed and its potential dynamic changes and calibration sequences
- 2. Influenced by 1: known ODT (On die termination) used
- 3. Silicon hardware process corner (fast, typical, slow)
- 4. PCB hardware process corner (high, low impedance)
- 5. Accuracy of given IBIS models for Controller and DRAM, ODT and Driver settings unknown and determined by 1.



Figure 16: Infiniium oscilloscope capture of READ CYCLE measured on dataline DQ44 at DRAM probe point at a speed of ~1866 Mbps yielding in ~545 mV eye height



Figure 17: Related simulation results to Figure 15, DQ44 READ at DRAM probe point at 1866 Mbps at assumed Controller ODT and DRAM driver strength, yielding in 515 mV eye height

The quality of the eye shape is very similar between measurement and simulation, and the compared eye height fits within a 10% tolerance, what was the expectation. With spending more efforts in sweeping ODT schemes and other parameters in simulation setup 5% would be achievable.

Same is valid for the eye width, as the scope offers different measurement options for crossing level or statistical one, introducing a difference of smaller 5% (15 ps) as on a 2060 Mpbs measurement we determined the eye width to be 455 ps vs 460 ps simulated.

#### Conclusion

Keysight offers a full design to test solution for memory interfaces, including a variety of Tx, Rx, and protocol solutions to enable fast and accurate product to market cycles. This full solution is the single set in the industry, which provides accuracy from simulation, to measurement by using the same measurement science in both the hardware and simulation tools within the workflow.

- Simulate, analyze, and test for compliance with the PathWave ADS Memory Designer platform for design
- Probe and perform measurements, verifying your simulation to measurement with Keysight's consistent measurement-science across all products
- Utilize Keysight's support team to help you with your full design-to-test journey

For additional information / resources:

- www.keysight.com/find/oscilloscope
- www.keysight.com/find/pathwave-hsd-design

## Learn more at: www.keysight.com

For more information on Keysight Technologies' products, applications, or services, please contact your local Keysight office. The complete list is available at: www.keysight.com/find/contactus

