>>51034765Pipelining is a requirement for even acceptably fast general purpose CPUs. The very action of basic operations requires it. An ADD, for example:
LW <addr>
LW <addr2>
ADD <addr> <addr2>
SW <addr3>
With no pipelining this takes 4 clock cycles (assuming one cycle is one instruction).
Translated:
LW <addr> R1 (n1)
LW <addr2> R2 (n2)
ADD R1 R2 R3 (n3)
SW R3 <addr3> (n4)
Now, we can construct a directional acyclic graph based on these instructions:
n1->n2->n3->n4
But now lets consider:
x = a+b
y = a+c
LW <addra> R1
LW <addr2> R2
ADD R1 R2 R3
LW <addr3> R4
ADD R1 R4 R5
SW R3 <addr4>
SW R5 <addr5>
This is the primary benefit of pipelining. Instead of waiting for an operation with a data dependency (SW R3) to be able to complete, you can complete another operation (ADD R1 R4 R5). This becomes even more beneficial when you can do out of order execution, which in this example would let you begin the LW <addr3> R4 before doing the ADD R1 R2 R3, as a simple example.
Another way of thinking about it is that the clockspeed of the processor is the time it takes the shortest operation to complete. LW and SW are slow because memory access is slow. ADD is fast because APU operations can be fast. If you can find a way to keep doing ADDs while waiting on LWs, you increase the effectiveness of the processor.