VLIW
At the extreme of static scheduling are the Very Long Instruction Word
(VLIW) architectures.
Here instructions lengths of over 100 bits describe operations
to be performed by from 5 to 16 functional units.
Our loop unrolled 7 times:
| Memory ref 1 |
Memory ref 2 |
FP operation 1 |
FP operation 2 |
Integer |
| ld f0,0(r1) |
ld f6,-8(r1) |
| ld f10,-16(r1) |
ld f14,-24(r1) |
| ld f18,-32(r1) |
ld f22,-40(r1) |
addd f4,f0,f2 |
addd f8,f6,f2 |
| ld f26,-48(r1) |
|
addd f12,f10,f2 |
addd f16,f14,f2 |
|
|
addd f20,f18,f2 |
addd f24,f22,f2 |
| sd 0(r1),f4 |
sd -8(r1),f8 |
addd f20,f18,f2 |
|
| sd -16(r1),f12 |
sd -24(r1),f16 |
| sd -32(r1),f20 |
sd -40(r1),f24 |
|
|
subi r1,r1, #56 |
| sd -48(r1),f28 |
|
|
|
bnez r1, Loop |
7 results in 9 clocks = 1.3 clocks per iteration.
Advantages/Disadvantages
Simpler issue logic.
Instruction bandwidth
Instruction density
More registers needed
More ports to register file (6 read, 3 write for integer; 6 read, 4 write for FP).
Not tolerant of variations in latency
Not binary compatible
Lock step execution means 1 hazard stalls all FU's
to Multi Issue.