Cerebras WSE-3: Wafer-Scale Yield Analysis

A Monte-Carlo reconstruction of how fine-grained core redundancy and a route-around mesh make a single 46,225 mm² monolithic chip economically yieldable — and what that redundancy costs in silicon.

By Abhishek Lal | Source: Cerebras — "How Cerebras Solved the Wafer-Scale Yield Challenge" (white paper) | WSE-3: 215×215 mm wafer · 84 dies (12×7) · 17×30 mm die · ~970k physical / ~900k logical cores

The Yield Thesis

Cerebras claims a whole-wafer processor can yield economically because defects are routed around, not discarded.

A conventional chip treats a fatal defect as terminal: a single killer defect can scrap the whole die, so yield falls steeply as die area grows. The WSE instead tiles the wafer with ~900,000 tiny cores in a 2-D mesh and bakes in spare rows. When a core is defective, redundancy links re-route the logical fabric around it, and only a handful of spare cores are spent. Because each core is minuscule (~0.05 mm²), the fixed crop of defects on a wafer lands as isolated points that the mesh absorbs.

Premises of this model:
(a) Defects are independent per core, with probability set by defect density and core area.
(b) Recovery is local: a column can bridge a short vertical run of dead cores and shift ±1 row between neighbours.
(c) The wafer is one chip — a single unrecoverable die scraps the whole wafer.

The question this dashboard answers: how do defect density and core size actually drive wafer yield under that route-around scheme — and how much silicon does the redundancy consume?

▦ — WSE-3 Fixed Geometry

Wafer

215×215

mm (square)

Die

17×30

mm = 510 mm²

Dies

12 × 7

Wafer area

46,225

mm² (waste denom.)

Physical cores

~970k

84 × 11,560

Logical cores

~900k

~93% active

∿ — Parameters

Adjust & Re-run

Defect density

0.0010 / mm²

drives the defect map & live metrics

Redundancy rows

12 → 816 spare/die

rows turned off for recovery

Manhattan reach R

vertical reach in current column · needs RR ≥ R−1

Monte-Carlo trials

1000 wafers / point

more trials → smoother curves

Cores per die — X

physical, X direction

Cores per die — Y

physical, Y direction

▸ — Current Configuration — Computed

Sampled wafer defect map

A fresh random draw every second (use play / pause below) — 84 die regions and their defect sites at the current settings.

Wafer yield vs. core size — by defect density

Core area swept from WSE-3-scale (~0.044 mm²) up to 1 mm², one curve per defect density (reach R and redundancy rows held at the current values). The dashed black line is your current configuration swept across core sizes; the diamond marks the core size set in the inputs. Smaller cores → finer-grained redundancy → defects routed around → higher yield.

Wafer yield vs. core size — by Manhattan reach R

Defect density held constant (current D); reach R varied over 2, 3, 4. Larger reach bridges longer dead-core runs. The dashed black line is your current configuration swept across core sizes; the diamond marks the core size set in the inputs.

Wafer yield vs. core size — by redundancy rows

Defect density and reach held constant (current D, R); redundancy rows varied over 10, 20, 30, 40. A curve simply ends where the grid would have fewer rows than RR (an invalid configuration). The dashed black line is your current configuration swept across core sizes; the diamond marks the core size set in the inputs.

∑ — Mathematics

Notation & geometry

# symbols D defect density (mm⁻²) A_core area of one core n_x, n_y cores/die (X, Y) RR redundancy rows R Manhattan reach p per-core defect prob. λ mean defects in a core N Monte-Carlo trials k defects drawn on a die

A_die = 17 × 30 = 510 mm² (used for core sizing) A_wafer = 215 × 215 = 46,225 mm² (used as the waste denominator) dies = 12 × 7 = 84 A_core = A_die / (n_x · n_y) physical cores / die = n_x · n_y logical cores / die = n_x · (n_y − RR) % cores active = (n_y − RR) / n_y × 100

The wafer is one monolithic chip. The leftover 215 − 12·17 = 11 mm and 215 − 7·30 = 5 mm are scribe lines — excluded from core sizing but kept inside A_wafer. Defaults: 68×170 → A_core = 0.0441 mm² (≈ the paper's 0.05 mm²); 68×158 = 10,744 logical/die (×84 ≈ 900k); 68×170 = 11,560 physical/die (×84 ≈ 970k).

Defect model & sampling

Every core is an independent Bernoulli trial. Its expected number of defects is λ = D · A_core, and its probability of carrying at least one defect is the Poisson model:

p = 1 − e^(−λ) (λ = D · A_core)

Defects are placed independently, so they are not spatially clustered; for a mature node λ is tiny (≈ 1e−5 per core), so a core is overwhelmingly likely to be defect-free.

The expected defects on a die, and the two equivalent sampling schemes:

μ = n_x · n_y · p ≈ n_x · n_y · λ = D · A_die ≈ 0.51 (at D = 0.001) # implemented (fast, default — defects are sparse, p ≈ 1e−5) k ~ Poisson(μ); scatter k defects on uniform-random cells (with replacement) # exact equivalent (available on request) k ~ Binomial(n_x·n_y, p); place k defects without replacement (distinct cores)

The per-die count is essentially independent of how finely the die is partitioned — core size instead changes how those defects map onto the routing grid. Dies are independent; the wafer map is the union of the 84 die maps.

Route-around recovery & wafer yield

Cores form a 2-D mesh (normal links = Manhattan-1). Redundancy links add reach R, but not the full Manhattan-R neighbourhood is wired: only the current column (vertically) and its two immediate neighbours. Raising R lengthens the vertical reach within a column; the horizontal span stays ±1. For each column c:

r_c = longest run of consecutive defective cores in column c (vertical) d_c = total defective cores in column c column c passes ⇔ r_c < R AND d_c ≤ RR die recoverable ⇔ every column passes constraint: RR ≥ R − 1 (flagged with a warning if violated)

Reach (r_c < R): links bridge a gap of R−1 dead cores, so a solid run of R cannot be jumped. Capacity (d_c ≤ RR): enough working cores must remain to host n_y − RR logical rows. Columns are evaluated independently — no multi-column horizontal detours are simulated.

The WSE ships as a single chip, so one unrecoverable die discards the whole wafer. The true yield has no closed form, so it is estimated by Monte Carlo over N independent wafers:

wafer good ⇔ all 84 dies recoverable Ŷ = (1/N) · Σ G_t over t = 1 … N G_t = 1 if wafer t good standard error: SE(Ŷ) = √( Ŷ(1−Ŷ) / N ) example: Ŷ = 0.90, N = 1000 → SE ≈ 0.0095 (±1.0 pp at 1σ, ±1.9 pp at 95%)

This sampling noise is the slight jitter in the curves; quadrupling trials halves it. Each (core-area, density) point on the yield chart is its own independent N-wafer estimate.

Silicon wasted on redundancy (geometric)

wasted % = (RR · n_x · 84 · A_core) / A_wafer × 100 equivalently = (disabled cores · A_core) / A_wafer disabled-core fraction of total core area = RR / n_y at defaults (RR = 12): 6.54 %

Purely geometric — the silicon designed out as spare rows across all dies, independent of the simulation.

⚐ — Assumptions & Limitations

What this model assumes

Scribe lines absorb the wafer/die size mismatch: excluded from core sizing, included in the waste denominator (full 215×215).
Core-count-per-die figures are estimates, anchored to WSE-2's 66×154 logical/die and the ~900k logical / ~970k physical totals; the paper quotes ~0.05 mm² cores.
Defects are independent across cores and dies; no spatial clustering is modelled.
Recovery is evaluated per column independently — logical rows are not routed horizontally across multiple columns.
A defect disables exactly its own core; recovered cores are fully functional; the RR redundancy rows are sacrificed proactively (a geometric cost), not consumed one-per-defect.
The default sampler (Poisson + with-replacement) is a sparse-defect approximation; switch to Binomial + without-replacement for exactness at high defect densities.