Part of the excitment in working in the computer industry is the rate of progress. In November 1995, Intel released the P6, the Pentium Pro microprocessor. None of the library books consulted had any reference to this advanced CPU. Information was gathered from the Intel World Wide Web site and [HEN96 ]
The P6 (Pentium Pro) is a superscalar level 3 processor. It has three pipelines and is capablie of executing three integer instuctions per clock cycle. Further more, it employees speculative execution to predict program flow and to execute ahead of their normal execution sequence.
The results of this speculative execution are stored in the Re-Order Buffer (ROB) so that they may be discarded due to a change in program flow or they may be retired, that is, commited if the instruction results in the ROB are accepted.
If the P6 encounters a data dependency, i.e. a data hazard that was previously discussed, it will speculatively execute the next following instruction that does not have a data dependency. For example, assume the next four instructions to be executed are in addresses 1, 2, 3, and 4, and further more, instruction 3 is dependent on the results of instruction 4. The P6 will execute instructions 1, 2, and 4 in a single clock cycle. The results of instruction 4 will not be retired until after instruction 3 has executed.
Dynamic Execution is the process of utilizing branch prediction, analyzing for data dependencies, and utilizing speculative execution.
What are the results of this advancement in pipelining? In one example shown by Intel, the Pentium completes 17 instructions in 19 cycles, less than one instruction per cycle. When the Pentium has the necessary data, it does very well. But often it stalls. The Pentium Pro, in contrast, completes the same 17 instructions in 9 clock cycles, slightly less than half the time. In addition, another 10 instruction have been speculatively executed and are awaiting retirement.
In one example, the Pentium Pro is over twice as fast as the Pentium (both CPUs at same clock rate).
What does the future hold? Increased Instruction Level Parallelism. ILP is already found in the 68080 and Pentium chips. Other processors have even more. For instance, the IBM RS-6000 series has four pipelines.
The gap between CPU speed and memory speed will continue to widen. Some memory designs are becoming pipelined themselves. ([KUS91], [SCH90]) The sooner the CPU can notify the memory of an anticipated access, the quicker the data can be retrieved. Increased pipelining will be the major component in making computers ever faster.
Tony Wesley
Comments to author: tony@tonywesley.com
Last Updated: November 26, 1995