Bryand and O'Hallaron Chapter 3 Section 1 History Intel 8008 microprocessor (1972, 10 microns, 500KHz, 3,500 transistors, 18 pin DIP) The 8008 Registers code name contents Special function +---------------+ 000 A |b b b b b b b b| the accumulator +---------------+ 001 B |b b b b b b b b| +---------------+ 010 C |b b b b b b b b| +---------------+ 011 D |b b b b b b b b| +---------------+ 100 E |b b b b b b b b| +---------------+ 101 L |b b b b b b b b| contains 8 low order bits of a memory address +---------------+ 110 H |b b b b b b b b| contains 6 high order bits of a memory address +---------------+ Machine language instructions for the 8008 processor included 1, 2, and 3 byte instructions using the following instruction formats: length first byte second byte third byte +---------------+ 1 byte | op code | +---------------+ +---------------+ +---------------+ 2 bytes | op code | | data | Immediate +---------------+ +---------------+ +---------------+ +---------------+ +---------------+ 3 byte3 | op code | |low 8 addr bits| |x|x|high 6 bits| Jump or Call +---------------+ +---------------+ +---------------+ To implement the program counter and a stack, the processor contained eight 14-bit registers and a 3-bit stack pointer to specify which PC register contained the “active” program counter. The 8008 architecture actually specified 16-bit registers but, since no one anticipated ever needing a computer with more than 2^16 or 16 kilobytes of memory, only 14 of the 16 bits were implemented, saving perhaps 100 transistors. Stack of eight 14-bit registers – Binary representation +---------------------------+ PC 000 |b b b b b b b b b b b b b b| +---------------------------+ PC 001 |b b b b b b b b b b b b b b| +---------------------------+ PC 010 |b b b b b b b b b b b b b b| +---------------------------+ PC 011 |b b b b b b b b b b b b b b| +---------------------------+ PC 100 |b b b b b b b b b b b b b b| +---------------------------+ PC 101 |b b b b b b b b b b b b b b| stack +---------------------------+ pointer PC 110 |b b b b b b b b b b b b b b| +---------------------------+ +-----+ PC 111 |b b b b b b b b b b b b b b| <------|1 1 1| +---------------------------+ +-----+ Intel 8080 (1974, 6 microns, 2MHz, 6,000 transistors, 40 pin DIP) The seven 8-bit 8080 registers (A, B, C, D, E, L, and H) were renumbered as follows: +---------------+ 000 B |b b b b b b b b| +---------------+ 001 C |b b b b b b b b| +---------------+ 010 D |b b b b b b b b| +---------------+ 011 E |b b b b b b b b| +---------------+ 100 H |b b b b b b b b| contains 8 low order bits of a memory address +---------------+ 101 L |b b b b b b b b| contains 6 high order bits of a memory address +---------------+ 111 A |b b b b b b b b| the accumulator +---------------+ This made it easier to include instructions to treat certain pairs of 8-bit registers as 16-bit registers (BC, DE, HL). The processor also contained a 16-bit instruction pointer (IP) and stack pointer (SP). Instructions were added to load, increment, and decrement the BC, DE, HL, and SP registers. In total, the 8080 added more than 40 new machine language instructions (111 in the 8080 versus 67 in the 8008, although the results can vary depending on how you define an “instruction”). +---------------+---------------+ 00 BC |b b b b b b b b|b b b b b b b b| +---------------+---------------+ 01 DE |b b b b b b b b|b b b b b b b b| +---------------+---------------+ 10 HL |b b b b b b b b|b b b b b b b b| +---------------+---------------+ 11 SP |b b b b b b b b b b b b b b b b| +---------------+---------------+ +-------------------------------+ IP |b b b b b b b b b b b b b b b b| +-------------------------------+ In addition, the 8080 improved the way that interrupts were handled and implemented a traditional stack, with a 16-bit stack pointer in the processor pointing to a stack located in memory. Intel 8086 and 8088 Processors (P1, 1978, 3 microns, 5 MHz, 29,000 transistors, 40-pin DIP) The P1 (Processor design 1) is the first member of Intel's x86 family of processors. The 8086 was Intel’s first 16-bit processor. As the successor to the 8080 8-bit processor, the 8086 it was designed to be compatible with the 8080 at the assembly language level. This was accomplished through the use of eight 16-bit registers, four of which could be treated as a pair of 8-bit registers. The 8086 names are on the right and the analogous 8080 names are on the left. In referencing the accumulator on the 8086, the programmer could refer to AH (A register, High 8-bits), AL (A register, Low 8 bits) or AX (A register, all 16 bits). +---------------+---------------+ A | AH | AL | %ax (Accumulator) +---------------+---------------+ BC | BH | BL | %bx (Base) +---------------+---------------+ DE | CH | CL | %cx (Count)) +---------------+---------------+ HL | DH | DL | %dx (Data) +---------------+---------------+ +-------------------------------+ SP |b b b b b b b b b b b b b b b b| %sp (stack pointer) +-------------------------------+ |b b b b b b b b b b b b b b b b| %bp (base pointer) +-------------------------------+ |b b b b b b b b b b b b b b b b| %si (source index) +-------------------------------+ |b b b b b b b b b b b b b b b b| %di (destination index) +-------------------------------+ +-------------------------------+ IP |b b b b b b b b b b b b b b b b| ip (instruction pointer) +-------------------------------+ The 8086 and 8088 used four segment registers to increase the size of addresses from 16 to 20 bits (increasing maximum memory size from 64K bytes to 1 megabyte). The four segment registers are: CS (Code Segment) register. Starting address of the current 64k code segment. SS (Stack Segment) register. Starting address of the current 64k stack segment. DS (Data Segment) register. Starting address of the current 64k data segment. ES (Extra Segment) register. Starting address of a second 64k data segment. As long as a program used less than 64k bytes of code, 64k bytes of stack space, and 128k bytes of data in memory, the four segment registers could just be initialized when the program was loaded into memory and not modified. Larger programs were required to modify the segment registers as they executed. (If an array occupied more than 64k bytes of code, the array was divided into 64k byte blocks and each reference to an array element required computing the correct block from the subscript(s) and loading the appropriate value into the DS or ES register.) Intel 286 Processor (P2, 1982, 1.5 microns, 6 MHz, 134,000 transistors, 68-pin PLCC) The P2 (Processor 2) is the second member of Intel's x86 family of processors. To permit 24-bit addresses, the operation of the segment registers was changed (and two segments registers were added). Instead of containing the starting address of a 64-byte segment, each segment register was used as a subscript to access a table in memory. Each of the 8-byte (64-bit) entries in the table contained a 3-byte (24-bit) segment starting address and a 2-byte (16-bit) segment length. Additional bits in the 64-bit table entries along with a MMU (Memory Management Unit) and new instructions allowed the operating system to provide programs with a much larger virtual memory space. The use of four privilege levels allowed an operating system to protect itself (as well as application programs) from misbehaving application software). The 286 was the first processor in the 86 family to provide the kind of features that are needed by any general purpose multitasking system. Intel 386 Processor (P3, 1985, 1.5 micron, 16 MHz, 275,000 transistors, 132-pin PGA) The 386 is the third member (P3) of Intel's x86 family of processors and the first processor to implement what is now called IA-32 (Intel Architecture 32-bits). Recall that Intel created the 16-bit 8086 processor by “stretching” the registers of the 8080 from 8 to 16 bits. In a similar manner, the 386 processor stretched the 8086 registers from 16 to 32 bits. The major processor registers included: <-------------------- 32-bit %e?x registers -------------------> <---- 16-bit %?x registers ----> <- 8-bit reg -> <- 8-bit reg -> 32-bit 16-bit Regs regs +-------------------------------+---------------+---------------+ %eax | | %ah | %al | %ax +-------------------------------+---------------+---------------+ %ebx | | %bh | %bl | %bx +-------------------------------+---------------+---------------+ %ecx | | %ch | %cl | %cx |-------------------------------+---------------+---------------+ %edx | | %dh | %dl | %dx +-------------------------------+---------------+---------------+ +---------------------------------------------------------------+ %esp | | | %sp +---------------------------------------------------------------+ %ebp | | | %bp +---------------------------------------------------------------+ %esi | | | %si +---------------------------------------------------------------+ %edi | | | %di +---------------------------------------------------------------+ +---------------------------------------------------------------+ eip | | | ip +---------------------------------------------------------------+ The following instructions show how the %al, %ah, %ax, and %eax (Extended ax register) can be accessed by an assembly language program. clrb %al # clear bits 0 to 7, remaining 24 bits unchanged clrb %ah # clear bits 8 to 15, remaining 24 bits unchanged clrw %ax # clear bits 0 to 15, remaining 16 bits unchanged clrl %eax # clear bits 0 to 31 Because the 386 was a 32-bit machine using 32-bit addresses, up to 4 Gigabytes of memory could be directly addressed. For backward compatibility, the architecture still included the segment registers, but by setting all of the segments to start at address 0x00000000, the segmentation hardware seems to "disappear" creating what is called the "flat" memory model. Later processors have added many features (in addition to much higher speeds), but the 386 instruction set (IA-32) is the basis for all of the later processors. Intel 486 Processor (P4, 1989, .8 microns, 25 MHz, 1.2M transistors, 168-238 pin PGA) The fourth design (P4) of the x86 processor family. Integrated the floating point unit on the processor chip (as did later models of the Intel 386). Basically just a faster Intel 386. Pentium Processor (P5, 1993, .8 microns, 60 MHz, 3.1M transistors, 273 pin socket 4) The fifth design (P5) of the x86 processor family. Basically a faster 486. As the Pentium evolved, later models include 8-16 KB of on-chip level 1 cache and 1-4 MB of level 2 cache. The Pentium MMX Processor added MMX (MultiMedia eXtensions) instructions which could manipulate (add, subtract, multiply, etc) 64-bit quantities that represented eight 8-bit number, four 16-bit numbers or two 32-bit integers. Instructions that can operate on multiple items of data at the same time (e.g. adding four pairs of 16-bit numbers at the same time) are referred to as SIMD (single instruction, multiple data) instructions. Pentium Pro (P6, 1995, .35 microns, 150 MHz, 6.5M transistors, 387 pin socket 8) The sixth design (P6) of the x86 processor family. Added conditional moves to the instruction set. The internal design of the modern "Core 2" (P8?) processors is similar to the P6. Included 256-512 KB of level 2 cache on a separate chip inside the package containing the processor chip. Pentium II (P6, 1997, .35 microns, 233 MHz, 6.5M transistors, 242-contact slot 1) Added the MMX instructions to the Pentium Pro P6 architecture along with an integrated L2 cache. Packaged on a separate "daughter card" that plugged into a slot on the motherboard. Pentium III (P6, 1999, .25 microns, 450 MHz, 9.5M transistors, Socket 370) Added the SSE (Streaming SIMD Extensions) instruction set to augment the MMX instructions (which used the floating point registers but could only process integer operands). Included 70 new instructions and eight new 128-bit registers %xmmo through %xmm7), each of which could hold four 32-bit floating point numbers. The SSE2 (introduced with the Pentium 4) added integer support, providing SIMD instructions for data types from 8-bit integer to 64-bit floating point, making the MMX instructions somewhat redundant. The core and core 2 architectures introduced the latest version, SSE4, and dropped support for MMX. Pentium 4 (P7, 2000, .18 micron, 1.4 GHz, 42M transistors, socket 423, 478) The seventh design (P7) of the x86 processor family. It was designed to outperform the AMD Athlon processors by using a deep pipeline (20 to 31 stages) with very high clock frequencies. (In contrast, the various P6 designs used 10 to 15 stage pipelines). The Pentium 4 never lived up to expectations (the Athlons were faster) but by 2006, it reached 3.8 Gigahertz with 169 million transistors using a 90nm process. The Pentium 4 had a long lifetime and various features were added over the years. In 2003, hyper threading was added. To perform a task switch (temporairly stop running my program and start running yours), the processor must save hundreds of bytes in processor registers. Hyper threading duplicated many of the processor to allow the task switch to be made in just a few clock cycles. When one task "stalls" because of a cache miss or a mispredicted branch instruction, the processor with hypertreading can quickly switch to another task. To allow the operating system to take advantage of this fast switch, the single core processor with hyperthreading appears to be a dual core processor but only registers, not the entire processor, have been duplicated. The Pentium 4's 32-bit addresses allow up to 4 gigabytes of memory to be directly accessed. It became clear that servers were going to require more memory than this and, in 2001, Intel introduced the 64-bit Itanium processors that used a completely new instruction set (IA-64). AMD took a different approach to 64-bit architecture. In 2003, AMD introduced the Opteron processor with the AMD-64 architecture which "stretched" the Pentium's registers from 32 to 64 bits (and increased the number of general registers from 8 to 16) while maintaining compatibility with the IA-32 (Pentium) architecture. In 2005, Intel followed AMD by adding the AMD-64 features to the Pentium 4, calling the resulting architecture Intel 64. 64-bit 32-bit 8-16 8-bit 16-bit Regs regs regs regs bits 63 32 31 16 15 8 7 0 +-------------------------------+---------------+---------------+ %rax | | %eax | %ah | %al | %ax +-------------------------------+---------------+---------------+ %rbx | | %ebx | %bh | %bl | %bx +-------------------------------+---------------+---------------+ %rcx | | %ecx | %ch | %cl | %cx +-------------------------------+---------------+---------------+ %rdx | | %edx | %dh | %dl | %dx +-------------------------------+---------------+---------------+ %rsp | | %esp | | %spl | %sp +---------------------------------------------------------------+ %rbp | | %ebp | | %bpl | %bp +---------------------------------------------------------------+ %rsi | | %esi | | %sil | %si +---------------------------------------------------------------+ %rdi | | %edi | | %dil | %di +---------------------------------------------------------------+ %r8 | | %r8d | %r8w | %r8b | %r8w +---------------------------------------------------------------+ %r9 | | %r9d | %r9w | %r9b | %r9w +---------------------------------------------------------------+ %r10 | | %r10d | %r10w | %r10b | %r10w +---------------------------------------------------------------+ %r11 | | %r11d | %r11w | %r11b | %r11w +---------------------------------------------------------------+ %r12 | | %r12d | %r12w | %r12b | %r12w +---------------------------------------------------------------+ %r13 | | %r13d | %r13w | %r13b | %r13w +---------------------------------------------------------------+ %r14 | | %r14d | %r14w | %r14b | %r14w +---------------------------------------------------------------+ %r15 | | %r15d | %r15w | %r15b | %r15w +---------------------------------------------------------------+ +---------------------------------------------------------------+ rip | | eip | ip | +---------------------------------------------------------------+ Newer Intel Microarchitecture Designs http://en.wikipedia.org/wiki/File:IntelProcessorRoadmap-3.svg Beginning with the NetBurst microarchitecture described above, Intel used names rather than numbers to identify its successive microprocessor microarchitectures. These designs are closer to the design of the P6 (Pentium Pro) than the NetBurst-based Pentium 4. They use a shallower pipeline (e.g. 14 stages) and slower clock speed to outperform the Pentium 4 with its longer pipeline and faster clock. The first Pentium 4’s were based on chips containing 42 million transistors and later Pentium 4’s had over 100 million transistors. As transistor counts increased toward the billion transistor chip, Intel developed a succession of more powerful microarchitectures. The newer chips feature: * Improved processor speeds. * Multiple cores (processors) on a single chip. * Larger cache sizes. *Better branch prediction logic *Superscalar execution (more than one instruction per clock cycle). In the future, we will see more of the logic on the motherboard (e.g. memory management logic) integrated into the processor chip. See http://en.wikipedia.org/wiki/File:IntelProcessorRoadmap-3.svg. The newer microarchitectures include: “http://en.wikipedia.org/wiki/Penryn_(microarchitecture)#Penryn . The Intel Core microarchitecture (previously known as the Next-Generation Micro-Architecture, or NGMA) is a multi-core processor microarchitecture unveiled by Intel in Q1 2006. The Core microarchitecture returned to lower clock rates and improved the usage of both available clock cycles and power when compared with the preceding NetBurst microarchitecture of the Pentium 4/D-branded CPUs.” “http://en.wikipedia.org/wiki/Nehalem_(microarchitecture) . Nehalem (pronounced / n? 'he ?l?m/[1]) is the codename for an Intel processor microarchitecture, successor to the Core microarchitecture.[2] Nehalem processors use the 45 nm process. Hyper-Threading is reintroduced along with an L3 Cache missing from most Core-based microprocessors.” “http://en.wikipedia.org/wiki/Sandy_Bridge . Sandy Bridge, Nehalem's successor, is the codename for a processor microarchitecture developed by Intel's Israel Development Center[1] beginning in 2005 targeting the 32 nm process. The codename was previously " Gesher" (meaning "bridge" in Hebrew).[2] Sandy Bridge processors were first released on January 9, 2011. The yet-to-be released 22 nm die shrink of Sandy Bridge has the codename Ivy Bridge.” Evolution of Intel x86 Architecture Processor year process speed transistors pins 8008 1972 10 microns 500KHz 3,500 18 8080 1974 6 microns 2MHz 6,000 40 8086 P1 1978 3 microns 5 MHz 29,000 40 286 P2 1982 1.5 microns 6 MHz 134,000 68 386 P3 1985 1.5 microns 16 MHz 275,000 132 486 P4 1989 .8 microns 25 MHz 1,200.000 168-238 Pentium P5 1993 .8 microns 60 MHz 3,100,000 273 Pentium Pro P6 1995 .35 microns 150 MHz 6,500,000 387 Pentium II P6 1997 .35 microns 233 MHz 6.500,000 242 Pentium III P6 1999 .25 microns 450 MHz 9,500,000 370 Pentium 4 2000 .18 microns 1,4 GHz, 42,000,000 423-478 Core 2 Duo 2006 .065 microns 2.66 GHz 291,000,000 77 Core i3-5-7 2008 .045 microns 2.8 GHz 731,000,000 1366 Sandy Bridge 2011 .032 microns 3.8 GHz 915,000,000 1155 (2011) Ivy Bridge 20?? .022 microns 1155 (2011)