Cerebras claims a whole-wafer processor can yield economically because defects are routed around, not discarded.
A conventional chip treats a fatal defect as terminal: a single killer defect can scrap the whole die, so yield falls steeply as die area grows. The WSE instead tiles the wafer with ~900,000 tiny cores in a 2-D mesh and bakes in spare rows. When a core is defective, redundancy links re-route the logical fabric around it, and only a handful of spare cores are spent. Because each core is minuscule (~0.05 mm²), the fixed crop of defects on a wafer lands as isolated points that the mesh absorbs.
Premises of this model:
(a) Defects are independent per core, with probability set by defect density and core area.
(b) Recovery is local: a column can bridge a short vertical run of dead cores and shift ±1 row between neighbours.
(c) The wafer is one chip — a single unrecoverable die scraps the whole wafer.
The question this dashboard answers: how do defect density and core size actually drive wafer yield under that route-around scheme — and how much silicon does the redundancy consume?
The wafer is one monolithic chip. The leftover 215 − 12·17 = 11 mm and 215 − 7·30 = 5 mm are scribe lines — excluded from core sizing but kept inside A_wafer. Defaults: 68×170 → A_core = 0.0441 mm² (≈ the paper's 0.05 mm²); 68×158 = 10,744 logical/die (×84 ≈ 900k); 68×170 = 11,560 physical/die (×84 ≈ 970k).
Every core is an independent Bernoulli trial. Its expected number of defects is λ = D · A_core, and its probability of carrying at least one defect is the Poisson model:
Defects are placed independently, so they are not spatially clustered; for a mature node λ is tiny (≈ 1e−5 per core), so a core is overwhelmingly likely to be defect-free.
The expected defects on a die, and the two equivalent sampling schemes:
The per-die count is essentially independent of how finely the die is partitioned — core size instead changes how those defects map onto the routing grid. Dies are independent; the wafer map is the union of the 84 die maps.
Cores form a 2-D mesh (normal links = Manhattan-1). Redundancy links add reach R, but not the full Manhattan-R neighbourhood is wired: only the current column (vertically) and its two immediate neighbours. Raising R lengthens the vertical reach within a column; the horizontal span stays ±1. For each column c:
Reach (r_c < R): links bridge a gap of R−1 dead cores, so a solid run of R cannot be jumped. Capacity (d_c ≤ RR): enough working cores must remain to host n_y − RR logical rows. Columns are evaluated independently — no multi-column horizontal detours are simulated.
The WSE ships as a single chip, so one unrecoverable die discards the whole wafer. The true yield has no closed form, so it is estimated by Monte Carlo over N independent wafers:
This sampling noise is the slight jitter in the curves; quadrupling trials halves it. Each (core-area, density) point on the yield chart is its own independent N-wafer estimate.
Purely geometric — the silicon designed out as spare rows across all dies, independent of the simulation.
215×215).RR redundancy rows are sacrificed proactively (a geometric cost), not consumed one-per-defect.