Software Pipelining

Observation:

If iterations of the loop are independent (no loop carried dependencies) then we can get ILP by taking instructions from different iterations:

            ld     f0, 0(r1)
            addd   f4,f0,f2
            ld     f0,-8(r1)
     Loop:  sd     0(r1),f4       ; store M[i]
            addd   f4,f0,f2       ; add to M[i-1]
            ld     f0,-16(r1)     ; load M[i-2]
            subi   r1,r1,#8
            bnez   r1,Loop
            sd     0(r1), f4
            addd   f4,f0,f2
            sd     -8(r1),f4

[up] to Multiple Issue.