Tomasulo Algorithm

The Tomasulo algorithm was first implemented in the IBM 360/91 Floating Point Unit which came out three years after the CDC 6600. This scheme was intedned to address several issues:
A small number of floating point registers available
the 360/91 had 4 double precision registers.
Long memory latency
this was just prior to the introduction of caches as a standard part of the memory hierarchy.
The cost effectiveness of functional unit hardware
with multiple copies of the same functional unit, some units were often underutilized.
The performance penalties of name dependencies.
these lead to WAW and WAR hazards.

Hardware Organization

The FPU consists of:

Instruction Buffer
Load and Store Buffer
Entries in these buffers consist of:
FP Register File
Entries in these registers consist of:
FP Functional Units with associated Reservation Stations
Entries in these buffers consist of:
Common Data Bus (CDB)
For writing results.

Instruction Execution

Like the 6600, instructions are executed in four stages:
Fetch
Fetch instructions from memory, align and insert into the FP instruction buffer (queue).
Issue
Execute
Functional units monitor the status of instructions in their reservation stations. When all operands are available, the instruction is given to the functional unit hardware for execution.
Write back
When the result from a functional unit is available, it is written on the CDB.

Other points

"Virtual" resources
Loop unrolling
Load/Store buffers
CDB bottleneck
Hardware cost

Comparing Tomasulo and the Scoreboard.


[up] to Overview.