VLIW

At the extreme of static scheduling are the Very Long Instruction Word (VLIW) architectures.

Here instructions lengths of over 100 bits describe operations to be performed by from 5 to 16 functional units.

Our loop unrolled 7 times:


Memory ref 1 Memory ref 2 FP operation 1 FP operation 2 Integer
ld f0,0(r1) ld f6,-8(r1)
ld f10,-16(r1) ld f14,-24(r1)
ld f18,-32(r1) ld f22,-40(r1) addd f4,f0,f2 addd f8,f6,f2
ld f26,-48(r1) addd f12,f10,f2 addd f16,f14,f2
addd f20,f18,f2 addd f24,f22,f2
sd 0(r1),f4 sd -8(r1),f8 addd f20,f18,f2
sd -16(r1),f12 sd -24(r1),f16
sd -32(r1),f20 sd -40(r1),f24 subi r1,r1, #56
sd -48(r1),f28 bnez r1, Loop

7 results in 9 clocks = 1.3 clocks per iteration.

Advantages/Disadvantages

Simpler issue logic.
Instruction bandwidth
Instruction density
More registers needed
More ports to register file (6 read, 3 write for integer; 6 read, 4 write for FP).
Not tolerant of variations in latency
Not binary compatible
Lock step execution means 1 hazard stalls all FU's

[up] to Multi Issue.