Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (14 page)

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture

2.4Mb size Format: txt, pdf, ePub

Read Book Download Book

ads

instructions over the first 8 ns of that program’s execution than it would have

had the pipeline been full for the entire 8 ns.

When the processor is executing programs that consist of thousands of

instructions, then as the number of nanoseconds stretches into the thousands,

the impact on program execution time of those four initial nanoseconds,

during which only one instruction was completed, begins to vanish and the

pipelined processor’s advantage begins to approach the fourfold mark. For

example, after 1,000 ns, the non-pipelined processor will have completed 250

instructions (1000 ns ÷ 0.25 instructions/ns = 250 instructions), while the pipelined processor will have completed 996 instructions [(1000 ns – 4 ns) ÷ 1

instructions/ns]—a 3.984-fold improvement.

What I’ve just described using this concrete example is the difference

between a pipeline’s
maximum theoretical completion rate
and its real-world
average completion rate
. In the previous example, the four-stage processor’s maximum theoretical completion rate, i.e., its completion rate on cycles

when its entire pipeline is full, is one instruction/ns. However, the processor’s average completion rate during its first 8 ns is 5 instructions/8 ns = 0.625

instructions/ns. The processor’s average completion rate improves as it

passes more clock cycles with its pipeline full, until at 1,000 ns, its average

completion rate is 996 instructions/1000 ns = 0.996 instructions/ns.

Chapter 3

At this point, it might help to look at a graph of the four-stage pipeline’s

average completion rate as the number of nanoseconds increases, illustrated

in Figure 3-9.

0.8

Average

Instruction

0.6

Throughput

(instructions/clock)

0.4

0.2

100

Clock Cycles

Figure 3-9: Average completion rate of a four-stage pipeline

You can see how the processor’s average completion rate stays at zero until

the 4 ns mark, after which point the pipeline is full and the processor can

begin completing a new instruction on each nanosecond, causing the average

completion rate for the entire program to curve upward and eventually to

approach the maximum completion rate of one instruction/ns.

So in conclusion, a pipelined processor can only approach its ideal

completion rate if it can go for long stretches with its pipeline full on every

clock cycle.

Instruction Throughput and Pipeline Stalls

Pipelining isn’t totally “free,” however. Pipelining adds some complexity to

the microprocessor’s control logic, because all of these stages have to be kept

in sync. Even more important for the present discussion, though, is the fact

that pipelining adds some complexity to the ways in which you assess the

processor’s performance.

Instruction Throughput

Up until now, we’ve talked about microprocessor performance mainly in

terms of instruction completion rate, or the number of instructions that the

processor’s pipeline can complete each nanosecond. A more common perfor-

mance metric in the real world is a pipeline’s
instruction throughput
, or the number of instructions that the processor completes
each clock cycle
. You might be thinking that a pipeline’s instruction throughput should always be one

instruction/clock, because I stated previously that a pipelined processor

completes a new instruction at the end of each clock cycle
in which the write
stage has been active
. But notice how the emphasized part of that definition qualifies it a bit; you’ve already seen that the write stage is inactive during

Pipelined Execution

clock cycles in which the pipeline is being filled, so on those clock cycles, the processor’s instruction throughput is 0 instructions/clock. In contrast, when

the instruction’s pipeline is full and the write stage is active, the pipelined

processor has an instruction throughput of 1 instruction/clock.

So just like there was a difference between a processor’s maximum

theoretical completion rate and its average completion rate, there’s also

a difference between a processor’s maximum theoretical instruction

throughput and its average instruction throughput:

Instruction throughput

The number of instructions that the processor finishes executing on

each clock cycle. You’ll also see instruction throughput referred to as

instructions per clock (IPC).

Maximum theoretical instruction throughput

The theoretical maximum number of instructions that the processor can

finish executing on each clock cycle. For the simple kinds of pipelined

and non-pipelined processors described so far, this number is always one

instruction per cycle (one instruction/clock or one IPC).

Average instruction throughput

The average number of instructions per clock (IPC) that the processor

has actually completed over a certain number of cycles.

A processor’s instruction throughput is closely tied to its instruction

completion rate—the more instructions that the processor completes each

clock cycle (instructions/clock), the more instructions it also completes over

a given period of time (instructions/ns).

We’ll talk more about the relationship between these two metrics in a

moment, but for now just remember that a higher instruction throughput

translates into a higher instruction completion rate, and hence better

performance.

Pipeline Stalls

In the real world, a processor’s pipeline can be found in more conditions

than just the two described so far: a full pipeline or a pipeline that’s being

filled. Sometimes, instructions get hung up in one pipeline stage for multiple

cycles. There are a number of reasons why this might happen—we’ll discuss

many of them throughout this book—but when it happens, the pipeline is

said to
stall
. When the pipeline stalls, or gets hung in a certain stage, all of the instructions in the stages below the one where the stall happened continue

advancing normally, while the stalled instruction just sits in its stage, and all the instructions behind it back up.

In Figure 3-10, the orange instruction is stalled for two extra cycles in the

fetch stage. Because the instruction is stalled, a new gap opens ahead of it in

the pipeline for each cycle that it stalls. Once the instruction starts advancing through the pipeline again, the gaps in the pipeline that were created by the

stall—gaps that are commonly called “pipeline bubbles”—travel down the

pipeline ahead of the formerly stalled instruction until they eventually leave

the pipeline.

Chapter 3