PowerPC 620: Case Study
The PPC620 is an implementation of the 64-bit version of the
PowerPC architecture.
Other recent processors with similar features include the MIPS R10000
and HPPA 8000. The DEC Alpha 21164 and UltraSPARC are also
multi-issue architectures, not quite as aggressive as the PPC620.
Features
Four way
superscalar
architecture
- can fetch, issue, and complete up to 4 instructions per cycle.
Six independent functional units:
Two Simple Integer units, XSU0 and XSU1.
- handles simple integer operations (add, subtract, logical)
with a single cycle latency.
One Complex Integer unit, MCFXU.
- handles integer multiply and divide with latency of 3 to 20
cycles. Multiply is fully pipelined; divide is unpipelined.
One Load/Store unit, LSU.
- handles all load and store instructions and includes its
own EA adder. This unit includes both the load and store buffers
and disambiguates memory references internally. The store
buffer is really two buffers - stores waiting for EA operands,
and stores waiting for commit. The load buffer allows one
outstanding cache miss to be processed while other loads
and stores proceed. Subsequent cache misses are returned
to the reservation station, allowing up to 3 misses to occur
before the unit completely stalls.
The cache is dual ported (two banks) to allow up to two
operations to proceed in parallel.
One Floating Point unit, FPU.
- handles all FP operations with a latency of 2 cycles
for multiply, add and multiply-add (3-stage pipeline),
31 for divide (unpipelined).
One Branch Prediction unit, BPU.
- predicts and completes branch instructions. Includes
the condition code register used in the PPC architecture.
Supports
hardware speculative execution
- similar to what we have seen except that uncommitted
results are stored in the store buffer or a set of renaming registers
instead of the reorder buffer.
Five stage pipeline
Fetch
- Fetches 4 instructions per cycle and updates the PC.
Includes a 256 two-way set associative entry
Branch Target Buffer
and a 2048 entry
Branch Prediction Buffer;
both updated by the BPU. Also includes a
return address stack.
Decode
- Decodes 4 instructions and prepares them for issue.
Issue
- Issues up to 4 instructions to reservation stations, allocating
a rename register for the result and a reorder buffer entry.
Execute
- When operands are available, issue to the functional unit
for computation. When results are available, they are broadcast
on one of the result busses (CDB), thus written in any
reservation station awaiting them and in the rename buffer.
If the instruction is a mispredicted branch, the fetch and
competion units are informed.
In any case, the completion unit is informed.
Commit
When all previous instructions have completed, up to
4 instructions can complete in one cycle by updating
the register file from the rename buffer and freeing
rename and reorder buffers freed. For store instructions
the LSU is informed to send store results to the cache.
to Overview.