Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (3 page)

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
6.11Mb size Format: txt, pdf, ePub
ads

Stages 3 and 4: Trace Cache Fetch ........................................................... 155

Stage 5: Drive ........................................................................................ 155

Stages 6 Through 8: Allocate and Rename (ROB) ....................................... 155

Stage 9: Queue ..................................................................................... 156

Stages 10 Through 12: Schedule .............................................................. 156

Stages 13 and 14: Issue ......................................................................... 157

Contents in Detail

xi

Stages 15 and 16: Register Files .............................................................. 158

Stage 17: Execute ................................................................................... 158

Stage 18: Flags ..................................................................................... 158

Stage 19: Branch Check ......................................................................... 158

Stage 20: Drive ..................................................................................... 158

Stages 21 and Onward: Complete and Commit ......................................... 158

The Pentium 4’s Instruction Window ....................................................................... 159

8

INTEL’S PENTIUM 4 VS. MOTOROLA’S G4E:

THE BACK END

161

Some Remarks About Operand Formats .................................................................. 161

The Integer Execution Units .................................................................................... 163

The G4e’s IUs: Making the Common Case Fast ........................................... 163

The Pentium 4’s IUs: Make the Common Case Twice as Fast ......................... 164

The Floating-Point Units (FPUs) ................................................................................ 165

The G4e’s FPU ........................................................................................ 166

The Pentium 4’s FPU ............................................................................... 167

Concluding Remarks on the G4e’s and Pentium 4’s FPUs ............................. 168

The Vector Execution Units .................................................................................... 168

A Brief Overview of Vector Computing ...................................................... 168

Vectors Revisited: The AltiVec Instruction Set ............................................... 169

AltiVec Vector Operations ........................................................................ 170

The G4e’s VU: SIMD Done Right .............................................................. 173

Intel’s MMX ............................................................................................ 174

SSE and SSE2 ........................................................................................ 175

The Pentium 4’s Vector Unit: Alphabet Soup Done Quickly .......................... 176

Increasing Floating-Point Performance with SSE2 ........................................ 177

Conclusions ......................................................................................................... 177

9

64-BIT COMPUTING AND X86-64

179

Intel’s IA-64 and AMD’s
x
86-64 ............................................................................. 180

Why 64 Bits? ...................................................................................................... 181

What Is 64-Bit Computing? .................................................................................... 181

Current 64-Bit Applications .................................................................................... 183

Dynamic Range ...................................................................................... 183

The Benefits of Increased Dynamic Range, or,

How the Existing 64-Bit Computing Market Uses 64-Bit Integers .............. 184

Virtual Address Space vs. Physical Address Space ...................................... 185

The Benefits of a 64-Bit Address ................................................................ 186

The 64-Bit Alternative:
x
86-64 ............................................................................... 187

Extended Registers .................................................................................. 187

More Registers ........................................................................................ 188

Switching Modes .................................................................................... 189

Out with the Old ..................................................................................... 192

Conclusion .......................................................................................................... 192

xii

Contents in Detail

10

THE G5: IBM’S POWERPC 970

193

Overview: Design Philosophy ................................................................................ 194

Caches and Front End .......................................................................................... 194

Branch Prediction ................................................................................................. 195

The Trade-Off: Decode, Cracking, and Group Formation .......................................... 196

The 970’s Dispatch Rules ......................................................................... 198

Predecoding and Group Dispatch ............................................................. 199

Some Preliminary Conclusions on the 970’s Group Dispatch Scheme ............ 199

The PowerPC 970’s Back End ................................................................................ 200

Integer Unit, Condition Register Unit, and Branch Unit ................................. 201

The Integer Units Are Not Fully Symmetric ................................................. 201

Integer Unit Latencies and Throughput ....................................................... 202

The CRU ................................................................................................ 202

Preliminary Conclusions About the 970’s Integer Performance ...................... 203

Load-Store Units .................................................................................................... 203

Front-Side Bus ..................................................................................................... 204

The Floating-Point Units ......................................................................................... 205

Vector Computing on the PowerPC 970 .................................................................. 206

Floating-Point Issue Queues ................................................................................... 209

Integer and Load-Store Issue Queues ......................................................... 210

BU and CRU Issue Queues ....................................................................... 210

Vector Issue Queues ................................................................................ 211

The Performance Implications of the 970’s Group Dispatch Scheme ........................... 211

Conclusions ......................................................................................................... 213

11

UNDERSTANDING CACHING AND PERFORMANCE

215

Caching Basics .................................................................................................... 215

The Level 1 Cache ................................................................................... 217

The Level 2 Cache ................................................................................... 218

Example: A Byte’s Brief Journey Through the Memory Hierarchy ................... 218

Cache Misses ......................................................................................... 219

Locality of Reference ............................................................................................. 220

Spatial Locality of Data ............................................................................ 220

Spatial Locality of Code ........................................................................... 221

Temporal Locality of Code and Data ......................................................... 222

Locality: Conclusions ............................................................................... 222

Cache Organization: Blocks and Block Frames ........................................................ 223

Tag RAM ............................................................................................................ 224

Fully Associative Mapping ..................................................................................... 224

Direct Mapping .................................................................................................... 225

N
-Way Set Associative Mapping ........................................................................... 226

Four-Way Set Associative Mapping ........................................................... 226

Two-Way Set Associative Mapping ........................................................... 228

Two-Way vs. Direct-Mapped .................................................................... 229

Two-Way vs. Four-Way ........................................................................... 229

Associativity: Conclusions ........................................................................ 229

Contents in Detail

xiii

Temporal and Spatial Locality Revisited: Replacement/Eviction Policies and

Block Sizes ................................................................................................... 230

Types of Replacement/Eviction Policies ...................................................... 230

Block Sizes ............................................................................................. 231

Write Policies: Write-Through vs. Write-Back ........................................................... 232

Conclusions ......................................................................................................... 233

12

INTEL’S PENTIUM M, CORE DUO, AND CORE 2 DUO

235

Code Names and Brand Names ............................................................................ 236

The Rise of Power-Efficient Computing ..................................................................... 237

Power Density ...................................................................................................... 237

Dynamic Power Density ............................................................................ 237

Static Power Density ................................................................................. 238

The Pentium M ..................................................................................................... 239

The Fetch Phase ...................................................................................... 239

The Decode Phase: Micro-ops Fusion ......................................................... 240

Branch Prediction .................................................................................... 244

The Stack Execution Unit .......................................................................... 246

Pipeline and Back End ............................................................................. 246

Summary: The Pentium M in Historical Context ........................................... 246

Core Duo/Solo .................................................................................................... 247

Intel’s Line Goes Multi-Core ...................................................................... 247

Core Duo’s Improvements ......................................................................... 251

Summary: Core Duo in Historical Context ................................................... 254

Core 2 Duo ......................................................................................................... 254

The Fetch Phase ...................................................................................... 256

The Decode Phase ................................................................................... 257

Core’s Pipeline ....................................................................................... 258

Core’s Back End .................................................................................................. 258

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
6.11Mb size Format: txt, pdf, ePub
ads

Other books

No Woman No Cry by Rita Marley
Fury by Salman Rushdie
The Carnival at Bray by Jessie Ann Foley
Night Visions by Thomas Fahy
The Missing One by Lucy Atkins