80386 Multiplication and Division (opens in new tab)

nand2mario.github.io·11w·Hacker News·Open original (opens in new tab)

When Intel released the 80386 in October 1985, it marked a watershed moment for personal computing. The 386 was the first 32-bit x86 processor, increasing the register width from 16 to 32 bits and vastly expanding the address space compared to its predecessors. This wasn’t just an incremental upgrade—it was the foundation that would carry the PC architecture for decades to come.

The timing was significant. By the mid-1980s, the IBM PC had established x86 as the dominant PC architecture, but the 16-bit 8086/286 processors were hitting their limits. Memory was constrained to 1MB (or 16MB with the 286’s limited protected mode). Competing 32-bit architectures like the Motorola 68020 threatened Intel’s dominance. The 386 was Intel’s answer: full 32-bit computing with backward compatibility for the massive library of existing DOS software.

The 386 introduced important and long-lasting x86 features: a flat 4GB address space, virtual memory with paging, and a protected mode that actually worked. It would go on to run Windows 3.0, Windows 95, early Linux, and countless other operating systems that shaped modern computing.

Faster arithmetic

In addition to its architectural advances, the 386 delivered a major jump in arithmetic performance. On the earlier 8086, multiplication and division were slow — 16-bit multiplication typically required 120–130 cycles, with division taking even longer at over 150 cycles. The 286 significantly improved on this by introducing faster microcode routines and modest hardware enhancements.

The 386 pushed performance further with dedicated hardware that processes multiplication and division at the rate of one bit per cycle, combined with a native 32-bit datapath width. The microcode still orchestrates the operation, but the heavy lifting happens in specialized datapath logic that advances every cycle.

Here are the actual cycle counts from the Intel 386 Programmer’s Reference Manual:

Instruction8-bit16-bit32-bit
MUL9-149-229-38
IMUL9-149-229-38
DIV142238
IDIV192743

The ranges for MUL/IMUL reflect an "early-out" optimization—the loop exits early when the remaining multiplier bits are all zeros (or all ones for signed). Division has no early-out, so cycle counts are fixed at roughly width + overhead.

To save silicon, the 386 reuses the main ALU for the per-iteration add/subtract work rather than having a separate multiplier unit. The microcode controls the iteration, while dedicated datapath logic handles the shifting and loop termination. Let’s look at how these algorithms work.

Add-and-shift multiplication

Loading more...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help