Tomasulo Algorithm
The Tomasulo algorithm was first implemented in the IBM 360/91 Floating Point
Unit which came out three years after the CDC 6600. This scheme was intedned
to address several issues:
A small number of floating point registers available
- the 360/91 had 4 double precision registers.
Long memory latency
- this was just prior to the introduction of caches as a standard
part of the memory hierarchy.
The cost effectiveness of functional unit hardware
- with multiple copies of the same functional unit, some units were often
underutilized.
The performance penalties of name dependencies.
- these lead to WAW and WAR hazards.
Hardware Organization
The FPU consists of:
Instruction Buffer
Load and Store Buffer
-
Entries in these buffers consist of:
- Busy bit - indicating the buffer element contains an outstanding
load or store operation.
- Tag - indicating the destination (or source for store)
of the data for the operation.
- Address (not shown) provided by the integer unit.
- Data.
FP Register File
-
Entries in these registers consist of:
- Valid bit - indicating the register contains the current value of the
register.
- Tag - indicating the current source of the register value if not present.
- Value - the register value, if present.
FP Functional Units with associated Reservation Stations
-
Entries in these buffers consist of:
- Busy bit - indicating the reservation station is occupied with
an outstanding instruction.
- Result Tag - the "name" of the result to be produced by this instruction.
- Source Operands
- Tag - the "name" of the source operand if not yet available
- Value - the value of the source operand once available.
Common Data Bus (CDB)
-
For writing results.
Instruction Execution
Like the 6600, instructions are executed in four stages:
Fetch
-
Fetch instructions from memory, align and insert into the FP
instruction buffer (queue).
Issue
-
- Get an instruction from the head of the queue,
- get the operand value or status from the FP register file
- issue the instruction to an available reservation station
for the appropriate functional unit.
Execute
- Functional units monitor the status of instructions in their
reservation stations. When all operands are available, the instruction
is given to the functional unit hardware for execution.
Write back
-
When the result from a functional unit is available, it is
written on the CDB.
Other points
"Virtual" resources
Loop unrolling
Load/Store buffers
CDB bottleneck
Hardware cost
to Overview.