Integrating Design Verification To Approach Zero Defects

As semiconductor applications in automotive, data center, and high-performance computing grow increasingly mission-critical, the industry faces mounting pressure to achieve near-perfect manufacturing test coverage—often exceeding 99%. Yet, meeting stringent zero-defect defective parts per million (DPPM) targets remains a formidable challenge. Traditional structural testing methods frequently miss subtle, hard-to-detect faults, leaving a critical coverage gap that can compromise reliability. Enhancing existing design-for-test (DFT) architectures to close this gap typically incurs high costs, increased silicon area, and potential performance trade-offs.

This paper introduces an integrated methodology that bridges this gap by combining functional fault grading with conventional struc…

This paper introduces an integrated methodology that bridges this gap by combining functional fault grading with conventional structural testing. The approach enables DFT and design verification (DV) teams to operate in parallel, accelerating development cycles while improving fault coverage. This cohesive strategy not only boosts test efficiency but also ensures robust fault detection across the full spectrum of potential failure modes.

Challenge

Chip developers face challenges in achieving high defect coverage due to complex design architectures, limited observability and controllability, the presence of analog/mixed-signal and third-party IP blocks, constraints on scan insertion, gaps between functional and structural testing, limitations of ATPG tools, and trade-offs between test time, cost, and coverage.

Functional fault grading helps address the challenges of achieving high defect coverage by identifying faults that are incidentally detected during functional testing, especially in areas where structural ATPG struggles—such as in analog/mixed-signal blocks, third-party IPs, or logic with limited observability. It complements structural testing by revealing hidden coverage gaps and improving overall fault detection without requiring additional scan logic or test patterns. A typical flow that includes functional fault grading is shown in Figure 1.

Fig. 1: Typical fault grading flow.

Things to consider for DFT methodology

When selecting a DFT methodology, there are several factors to consider. What fault models are supported by your ATPG scan tools? Do you need functional fault grading as part of that flow, and if so, are you using formal verification, simulation-based fault injection, or hardware emulation? Fault grading is primarily used to complement the coverage achieved by structural scan by analyzing the additional fault coverage provided by functional patterns, effectively topping up ATPG coverage.

To decide whether fault grading should be part of your DFT flow, begin by examining your test coverage objectives. If ATPG coverage falls short of your target—typically 99% or higher—fault grading can help identify and close the gaps using functional or infrastructure patterns.

Next, evaluate your design complexity. Designs such as SoCs, hierarchical architectures, or those with custom or graybox IP often include logic that ATPG struggles to fully test. In these scenarios, fault grading becomes crucial for validating those less accessible areas. Consider the availability of functional patterns in your validation environment. If you’re already running boot sequences, integrity checks, loopback tests, or other system-level simulations, fault grading can leverage these to extract fault coverage with minimal overhead. Think about your quality and reliability needs. Industries like automotive, aerospace, and medical often require ultra-low DPPM rates. Fault grading supports this by catching defects that ATPG might miss. Also factor in your tools and resources. If you have access to fault simulation tools and the computational power to run them efficiently, integrating fault grading is more practical. Finally, review your silicon history. If previous chips experienced fault escapes or poor correlation between ATPG coverage and valid defect detection, fault grading can help bridge that gap and improve future test effectiveness. See table 1 for a checklist to help determine if you need fault grading as part of your DFT methodology.


Question	If Yes → Consider Fault Grading
Is ATPG coverage below target?	✅
Are functional patterns already available?	✅
Is the design complex or hierarchical?	✅
Are you targeting low DPM or high quality?	✅
Do you have scan-excluded or graybox IPs?	✅
Are you seeing fault escapes in silicon?	✅

Table 1: Reasons for including fault grading in your methodology.

Choosing the right tools for fault grading

The choice between formal verification, simulation, and emulation for fault grading depends on the design size, test goals, and available resources. Formal fault grading is best suited for small to medium-sized blocks, especially when verifying control logic or safety-critical paths. It allows for exhaustive analysis without requiring test vectors and can mathematically prove whether faults are always detected or never activated. However, it doesn’t scale well to large designs and is limited in scope.

Simulation-based fault grading is ideal when functional patterns—such as boot tests, loopback tests, or integrity checks—are already part of the validation flow. It enables realistic fault activation and observation using actual stimuli, making it effective for analyzing non-capture patterns and graybox IPs. While it offers flexibility and good fault model support, it can be time-consuming for large designs or extensive pattern sets.

Emulation-based fault grading is the preferred method for large-scale designs or long-running functional tests that are impractical to simulate. It provides high-speed fault analysis in real-time or near real-time environments, making it suitable for system-level validation and software-driven tests. Although it requires specialized infrastructure and more complex setup, it significantly accelerates fault grading for full-chip scenarios. Table 2 provides a summary for deciding the best method/tool type for fault grading.


Method	Best For	Scale	Speed	Setup Complexity
Formal	Small blocks, control logic	Low	Fast	Low
Simulation	Functional pattern analysis	Medium	Moderate	Moderate
Emulation	Full-chip, long tests	High	Fast	High

Table 2: Decision guide for choosing fault grading tool.

Choosing the right simulator for fault grading

When choosing a fault simulator, it’s crucial to consider several key capabilities that can significantly improve your chances of success. The simulator should be flexible enough to work with various testbench types, whether you’re using native Verilog/SystemVerilog or UVM-based testbenches, and support multiple input waveform file formats to easily integrate with your existing workflow.

The flexibility to choose stimulus from RTL or gate-level simulations improves the chances of finding the right kind of stimulus. Finally, a simulator that supports a distributed computing framework capable of handling multiple fault scenarios simultaneously is important, as this will be key to maintaining efficiency as your testing needs grow.

The ability to perform fault grading using both RTL and GL stimulus files enables DV and DFT teams to work in parallel, significantly accelerating IC development. Instead of waiting for GL models, DFT engineers can begin fault analysis early using RTL stimulus, while DV engineers continue their functional verification. By overlapping these traditionally sequential activities, teams can identify and resolve both functional and testability issues sooner, ultimately shortening the overall IC development cycle and improving design quality.

As DV engineers complete their verification cycles and achieve code coverage closure, they can perform stimulus grading against specific fault lists. The simulator performs comprehensive analysis by verifying fault controllability and observability, determining optimal fault injection timing, and tracing fault propagation through the design structure. The simulator monitors fault propagation to observation points and classifies faults according to specific categories.

The framework/methodology should support an extensive range of fault types, from basic stuck-at faults to more sophisticated timing delay faults and cell-aware user-defined fault models. You must consider the design characteristics when determining the right fault models for DFT. The right fault models depend on several factors related to the design, technology node, and test goals. Some design characteristics to consider are:

Digital versus Analog/Mixed-Signal: Digital designs typically use stuck-at, transition, and path delay faults. Analog designs may require parametric fault models.
Technology Node: Advanced nodes (e.g., 7 nm and below) are more susceptible to timing-related and cell-internal faults.
Design Style: Full-custom, standard-cell, or FPGA-based designs may influence fault model applicability.

Fault grading overview

Fault grading involves injecting faults into the RTL or gate-level design and observing whether these faults propagate to a primary output or other observable points. This process helps quantify the fault coverage provided by functional patterns, which are typically created for design validation rather than manufacturing test. For example, in the schematic shown in Figure 2, a fault, represented by a red arrow, is injected on a signal line between a NOT gate and an AND gate, simulating conditions such as stuck-at-0 or stuck-at-1. The teal dotted line illustrates how the fault effect travels through the logic and reaches the detection point at Primary Out_1. For a fault to be considered detectable, it must be both activated and propagated to an observable point.

The presence of an analog/mixed-signal (AMS) module, depicted as a black box, introduces a challenge. Since AMS blocks cannot be directly controlled or observed using digital ATPG tools, coverage must be ensured through surrounding digital logic or by applying specialized mixed-signal test strategies. In this context, functional fault grading complements structural ATPG by identifying faults that are incidentally covered by functional patterns, helping to close coverage gaps.

This approach offers dual benefits: it leverages existing functional validation patterns for manufacturing test purposes and highlights portions of the design that remain untested.

Fig. 2: Schematic.

Figures 3 and 4 highlight a critical distinction in how we simulate faults for grading: port-level injection versus node-level injection. Let’s start with port fault injection, which is shown in Figure 3. Here, faults are injected at module ports, which aligns closely with the real silicon behavior, because port-level signals are what actual test hardware can control and observe. In this figure, the red arrow represents a fault injected at an output port of a logic block. Since this is a visible point in the design hierarchy, it is detectable by the system-level test and reflects a realistic fault detection scenario.

Fig. 3: Port fault injection.

Node fault injection, shown in Figure 4, is the default mode in many simulators. Here, faults are injected arbitrarily at internal nodes, wires inside modules, often below the module port boundary. The issue is that these internal nodes might not be controllable or observable at the system level. As a result, the simulation may indicate that a fault is detectable when, in reality, no test path exists in silicon to activate or observe that fault. This leads to overestimation of fault coverage. In other words, node injection can result in misleading fault grading because it includes fault sites that functional or scan-based patterns can’t realistically reach.

Fig. 4: Node fault injection (default mode).

In the context of fault grading, the primary duty of a simulator is to evaluate the controllability and observability when a fault is injected and propagated into the design. Using a provided set of functional patterns, the simulator identifies a time step in the golden simulation where a fault can be inserted and where a deviation can be caused. When identified, the fault is injected, and the simulation engine takes over. The simulation engine propagates the fault through the design, continually evaluating deviations against the golden reference. It monitors if the fault propagates to an observation point, and if it reaches the observation point, it is considered detected by that pattern (see Figure 5). This process repeats itself over the entire fault list and at the end, a fault classification report is provided.

Fig. 5: Fault propagation to observe point.

In summary, the simulator:

Identifies time in the simulation to inject a fault
Analyzes if a fault propagates through the design structure
Analyzes if a fault causes a functional deviation
Evaluates if a fault propagates to an observe point
Classifies faults based on ATPG tool terminology

An example of classification terminology is:

UO – Controlled_Unobserved
UC – Uncontrolled_Unobserved
DS – Controlled_Observed (Detected by Simulation)
PT – Potential Detected (X on observe point; partial credit)
UU – Unused fault
TI – Tied hi/low fault

Fault grading use cases

There a several use cases for fault grading. A typical use case for fault grading integrated with ATPG scan is shown below in Figure 6. The ATPG scan tool generates structural test patterns based on the gate-level netlist (GLS). These patterns can target stuck-at, transition, and other structural fault models.

The patterns, along with functional patterns generated by a simulator, are then passed to the fault simulator which performs functional fault grading, which includes fault injection, propagation analysis, and fault classification.

A comprehensive defect coverage report, quantifying how well the functional patterns detect modeled faults, is produced.

Fig. 6: Typical fault grading use case.

The next use case shows a fault grading flow tailored for burn-in applications. The ATPG scan tool or a simulator generates a stuck-at fault list. This list defines the fault universe, which is to be evaluated during burn-in and represents potential defects targeted during burn-in evaluation. In parallel, a functional simulator generates realistic functional patterns that mimic system-level activity during burn-in. These patterns can come from existing design verification testbenches and represent real-world operating conditions for burn-in (see Figure 7).

Fig. 7: Burn-in fault grading use case.

The next use case outlines how software test library (STL) patterns can be used to enhance fault coverage in digital design testing. The process begins with the gate-level design (GLS), which is analyzed using ATPG scan to generate a fault list. During this analysis, faults that cannot be activated or observed using structural patterns are flagged as untestable. These faults are then targeted using STL patterns, which are software-driven functional sequences designed to exercise the design in ways that structural ATPG cannot. STL patterns are typically developed at the software or firmware level and are commonly used in embedded environments for in-system testing.

Next, the STL patterns are simulated using a fault simulator, where fault injection, propagation, and classification are performed. This simulation determines which faults are activated and observed under STL stimulus. Faults are then classified as covered, uncovered, or still untestable.

Integration between the fault simulator and the ATPG scan tool allows STL patterns to be evaluated with the same rigor as structural tests, ensuring that no overlap is missed between software-level and scan-based testing.

By closing the coverage gap left by ATPG, especially for complex logic or system-level interactions that are only exercised during software execution, this approach significantly improves overall test quality (see Figure 8).

Fig. 8: STL fault grading use case.

Useful optimizations for fault grading methodologies

Stimulus grading and fault list refinement: Stimulus grading is the process of evaluating how effectively a set of input patterns activates and propagates faults within a digital design during fault simulation. It measures the quality of the stimuli by determining whether they can sensitize faults and drive them to observable points such as primary outputs or scan cells. This technique is important because it helps quantify the fault coverage achieved by functional patterns, identifies untested areas of the design, and improves the overall effectiveness of the test patterns. Stimulus grading also supports mixed-mode testing by validating coverage from software-driven or system-level tests and plays a critical role in improving test quality.
Fault list optimization: Fault list optimizationin DFT refers to the process of refining the list of faults generated during ATPG or fault simulation to focus on those that are meaningful, testable, and relevant to the design and manufacturing goals. This involves filtering out redundant, untestable, or low-impact faults and prioritizing faults that contribute most to coverage and quality metrics. This process is valuable because it improves the efficiency and effectiveness of the test flow. By reducing the number of faults to simulate or target, it lowers simulation time and computational overhead. It also helps focus test resources on faults that are more likely to occur in silicon, improving defect detection and reducing false positives. Additionally, fault list optimization supports better fault classification, enhances fault grading accuracy, and contributes to achieving higher fault coverage with fewer patterns.


BL (blocked)	82 ( 0.03%)	same ( 0.03%)	74 ( 0.02%)	same ( 0.02%)
RE (redundant)	1348 ( 0.45%)	same ( 0.46%)	1348 ( 0.45%)	same ( 0.45%)
AU (atpg_untestable)	5817 ( 1.95%)	1386 ( 0.47%)	2638 ( 0.88%)	1268 ( 0.43%)

test_coverage	97.98%	99.48%	98.92%	99.39%
fault_coverage	96.08%	97.52%	97.03%	97.48%

Table 3: Combined test coverage results.

Conclusion

Fault grading is a critical component of DFT as it directly measures test quality through fault coverage, identifies gaps in test patterns, and guides improvements in scan architecture and logic design. By enhancing fault detection capabilities, it supports higher product reliability and yield, ultimately ensuring robust and efficient manufacturing outcomes. To learn more, please download the paper, Beyond Traditional Testing: Integrating DV and DFT for Zero-Defect Goals.