x86

x86 is a family of instruction set architectures[a] based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

NOTE: 目前我们所使用的CORE i5都是采用的这种architecture。

Many additions and extensions have been added to the x86 instruction set over the years, almost consistently with full backward compatibility.[b] The architecture has been implemented in processors from Intel, Cyrix, AMD, VIA and many other companies; there are also open implementations, such as the Zet SoC platform.[2] Nevertheless, of those, only Intel, AMD, and VIA hold x86 architectural licenses, and are producing modern 64-bit designs.[3][irrelevant citation]

The term is not synonymous with IBM PC compatibility, as this implies a multitude of other computer hardware; embedded systems, as well as general-purpose computers, used x86 chips before the PC-compatible market started,[c] some of them before the IBM PC (1981) itself.

As of 2018, the majority of personal computers and laptops （笔记本电脑）sold are based on the x86 architecture, while other categories—especially high-volume[clarification needed] mobile categories such as smartphones or tablets （平板电脑）—are dominated by ARM; at the high end, x86 continues to dominate compute-intensive（计算密集的） workstation and cloud computing segments.[4]

Overview

Although the 8086 was primarily developed for embedded systems and small multi-user or single-user computers, largely as a response to the successful 8080-compatible Zilog Z80,[8]the x86 line soon grew in features and processing power. Today, x86 is ubiquitous（无处不在） in both stationary and portable personal computers, and is also used in midrange computers, workstations, servers and most new supercomputer clusters of the TOP500 list. A large amount of software, including a large list of x86 operating systems are using x86-based hardware.

Modern x86 is relatively uncommon in embedded systems, however, and small low power applications (using tiny batteries) as well as low-cost microprocessor markets, such as home appliances and toys, lack any significant x86 presence.[h] Simple 8-bit and 16-bit based architectures are common here, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo and Intel Atom are examples of 32- and 64-bit designs used in some relatively low power and low cost segments.

Chronology（年代表）

History

Other manufacturers

Extensions of word size

The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 80386 (later known as i386) which gradually replaced the earlier 16-bit chips in computers (although typically not in embedded systems) during the following years; this extended programming model was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture.

In 1999-2003, AMD extended this 32-bit architecture to 64 bits and referred to it as x86-64 in early documents and later as AMD64. Intel soon adopted AMD's architectural extensions under the name IA-32e, later using the name EM64T and finally using Intel 64. Microsoft and Sun Microsystems/Oracle also use term "x64", while many Linux distributions, and the BSDs also use the "amd64" term. Microsoft Windows, for example, designates its 32-bit versions as "x86" and 64-bit versions as "x64", while installation files of 64-bit Windows versions are required to be placed into a directory called "AMD64".[12]

Overview

Basic properties of the architecture

The x86 architecture is a variable instruction length, primarily "CISC" design with emphasis on backward compatibility. The instruction set is not typical CISC, however, but basically an extended version of the simple eight-bit 8008 and 8080 architectures. Byte-addressing is enabled and words are stored in memory with little-endian byte order. Memory access to unaligned addresses is allowed for all valid word sizes. The largest native size for integer arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well). Multiple scalar values can be handled simultaneously via the SIMD unit present in later generations, as described below.[l] Immediate addressing offsets and immediate data may be expressed as 8-bit quantities for the frequently occurring cases or contexts where a -128..127 range is enough. Typical instructions are therefore 2 or 3 bytes in length (although some are much longer, and some are single-byte).

To further conserve encoding space, most registers are expressed in opcodes using three or four bits, the latter via an opcode prefix in 64-bit mode, while at most one operand to an instruction can be a memory location.[m] However, this memory operand may also be the destination (or a combined source and destination), while the other operand, the source, can be either register or immediate. Among other factors, this contributes to a code size that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.

Floating point and SIMD

A dedicated floating point processor with 80-bit internal registers, the 8087, was developed for the original 8086. This microprocessor subsequently developed into the extended 80387, and later processors incorporated a backward compatible version of this functionality on the same microprocessor as the main processor. In addition to this, modern x86 designs also contain a SIMD-unit (see SSE below) where instructions can work in parallel on (one or two) 128-bit words, each containing two or four floating point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16 integers (each 64, 32, 16 or 8 bits wide respectively).

The presence of wide SIMD registers means that existing x86 processors can load or store up to 128 bits of memory data in a single instruction and also perform bitwise operations (although not integer arithmetic[n]) on full 128-bits quantities in parallel. Intel's Sandy Bridge processors added the AVX (Advanced Vector Extensions) instructions, widening the SIMD registers to 256 bits. Knights Corner, the architecture used by Intel on their Xeon Phi co-processors, uses 512-bit wide SIMD registers.

Current implementations

During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces called micro-operations. These are then handed to a control unit that buffers and schedules them in compliance with x86-semantics so that they can be executed, partly in parallel, by one of several (more or less specialized) execution units. These modern x86 designs are thus pipelined, superscalar, and also capable of out of order and speculative execution (via branch prediction, register renaming, and memory dependence prediction), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in the same order as given in the instruction stream.[13]Intel's and AMD's (starting from AMD Zen) CPUs are also capable of simultaneous multithreading with two threads per core (Xeon Phi has four threads per core) and in case of Intel transactional memory (TSX).

When introduced, in the mid-1990s, this method was sometimes referred to as a "RISC core" or as "RISC translation", partly for marketing reasons, but also because these micro-operations share some properties with certain types of RISC instructions. However, traditional microcode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously. Not having to synchronize the execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit.

The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into a more complex micro-op which fits the execution model better and thus can be executed faster or with less machine resources involved.

Another way to try to improve performance is to cache the decoded micro-operations, so the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. Intel followed this approach with the Execution Trace Cache feature in their NetBurst Microarchitecture (for Pentium 4 processors) and later in the Decoded Stream Buffer (for Core-branded processors since Sandy Bridge).[14]

Transmeta used a completely different method in their x86 compatible CPUs. They used just-in-time translation to convert x86 instructions to the CPU's native VLIW instruction set. Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations.

Segmentation

Further information: x86 memory segmentation

Minicomputers during the late 1970s were running up against the 16-bit 64-KB address limit, as memory had become cheaper. Some minicomputers like the PDP-11 used complex bank-switching schemes, or, in the case of Digital's VAX, redesigned much more expensive processors which could directly handle 32-bit addressing and data. The original 8086, developed from the simple 8080 microprocessor and primarily aiming at very small and inexpensive computers and other specialized devices, instead adopted simple segment registers which increased the memory address width by only 4 bits. By multiplying a 64-KB address by 16, the 20-bit address could address a total of one megabyte (1,048,576 bytes) which was quite a large amount for a small computer at the time. The concept of segment registers was not new to many mainframes which used segment registers to swap quickly to different tasks. In practice, on the x86 it was (is) a much-criticized implementation which greatly complicated many common programming tasks and compilers. However, the architecture soon allowed linear 32-bit addressing (starting with the 80386 in late 1985) but major actors (such as Microsoft) took several years to convert their 16-bit based systems. The 80386 (and 80486) was therefore largely used as a fast (but still 16-bit based) 8086 for many years.

NOTE : 20世纪70年代后期的小型计算机在16位64 KB地址限制下运行，因为内存变得更便宜。像PDP-11这样的小型计算机使用复杂的存储体切换方案，或者在Digital的VAX情况下，重新设计了更昂贵的处理器，可以直接处理32位寻址和数据。最初的8086是从简单的8080微处理器开发出来的，主要针对非常小而便宜的计算机和其他专用设备，而是采用简单的段寄存器，将存储器地址宽度仅增加了4位。通过将64-KB地址乘以16，20位地址可以处理总共1兆字节（1,048,576字节），这对于当时的小型计算机来说是相当大的数量。对于使用段寄存器快速交换到不同任务的许多大型机，段寄存器的概念并不新鲜。实际上，在x86上，它是一个备受批评的实现，它使许多常见的编程任务和编译器复杂化。然而，该架构很快就允许线性32位寻址（从1985年末的80386开始），但主要参与者（如微软）花了几年的时间来转换他们的16位系统。因此，80386（和80486）在很多年里主要用作快速（但仍然是16位）的8086。

NOTE: 上面提及的一段话：instead adopted simple segment registers which increased the memory address width by only 4 bits；必要小看这4 bits，它其实是扩大了16倍；

NOTE: 上面这段话中的However, the architecture soon allowed linear 32-bit addressing (starting with the 80386 in late 1985) 中的 linear 是非常重要的一种方式，它是和segment不同的；

Operating modes

Real mode

Main article: Real mode

Real Address mode,[24] commonly called Real mode, is an operating mode of 8086 and later x86-compatible CPUs. Real mode is characterized by a 20-bit segmented memory address space (meaning that only 1 MiB of memory can be addressed—actually, slightly more[p]), direct software access to peripheral hardware, and no concept of memory protection or multitasking at the hardware level. All x86 CPUs in the 80286 series and later start up in real mode at power-on; 80186 CPUs and earlier had only one operational mode, which is equivalent to real mode in later chips. (On the IBM PC platform, direct software access to the IBM BIOS routines is available only in real mode, since BIOS is written for real mode. However, this is not a characteristic of the x86 CPU but of the IBM BIOS design.)

In order to use more than 64 KB of memory, the segment registers must be used. This created great complications for compiler implementors who introduced odd pointer modes such as "near", "far" and "huge" to leverage the implicit nature of segmented architecture to different degrees, with some pointers containing 16-bit offsets within implied segments and other pointers containing segment addresses and offsets within segments. It is technically possible to use up to 256 KB of memory for code and data, with up to 64 KB for code, by setting all four segment registers once and then only using 16-bit offsets (optionally with default-segment override prefixes) to address memory, but this puts substantial restrictions on the way data can be addressed and memory operands can be combined, and it violates the architectural intent of the Intel designers, which is for separate data items (e.g. arrays, structures, code units) to be contained in separate segments and addressed by their own segment addresses, in new programs that are not ported from earlier 8-bit processors with 16-bit address spaces.

Protected mode

Main article: Protected mode

In addition to real mode, the Intel 80286 supports protected mode, expanding addressable physical memory to 16 MB and addressable virtual memory to 1 GB, and providing protected memory, which prevents programs from corrupting one another. This is done by using the segment registers only for storing an index into a descriptor table that is stored in memory. There are two such tables, the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT), each holding up to 8192 segment descriptors, each segment giving access to 64 KB of memory. In the 80286, a segment descriptor provides a 24-bit base address, and this base address is added to a 16-bit offset to create an absolute address. The base address from the table fulfills the same role that the literal value of the segment register fulfills in real mode; the segment registers have been converted from direct registers to indirect registers. Each segment can be assigned one of four ring levels used for hardware-based computer security. Each segment descriptor also contains a segment limit field which specifies the maximum offset that may be used with the segment. Because offsets are 16 bits, segments are still limited to 64 KB each in 80286 protected mode.[25]

Each time a segment register is loaded in protected mode, the 80286 must read a 6-byte segment descriptor from memory into a set of hidden internal registers. Therefore, loading segment registers is much slower in protected mode than in real mode, and changing segments very frequently is to be avoided. Actual memory operations using protected mode segments are not slowed much because the 80286 and later have hardware to check the offset against the segment limit in parallel with instruction execution.

The Intel 80386 extended offsets and also the segment limit field in each segment descriptor to 32 bits, enabling a segment to span the entire memory space. It also introduced support in protected mode for paging, a mechanism making it possible to use paged virtual memory (with 4 KB page size). Paging allows the CPU to map any page of the virtual memory space to any page of the physical memory space. To do this, it uses additional mapping tables in memory called page tables. Protected mode on the 80386 can operate with paging either enabled or disabled; the segmentation mechanism is always active and generates virtual addresses that are then mapped by the paging mechanism if it is enabled. The segmentation mechanism can also be effectively disabled by setting all segments to have a base address of 0 and size limit equal to the whole address space; this also requires a minimally-sized segment descriptor table of only four descriptors (since the FS and GS segments need not be used).[q]

Paging is used extensively by modern multitasking operating systems. Linux, 386BSD and Windows NT were developed for the 386 because it was the first Intel architecture CPU to support paging and 32-bit segment offsets. The 386 architecture became the basis of all further development in the x86 series.

x86 processors that support protected mode boot into real mode for backward compatibility with the older 8086 class of processors. Upon power-on (a.k.a. booting), the processor initializes in real mode, and then begins executing instructions. Operating system boot code, which might be stored in ROM, may place the processor into the protected mode to enable paging and other features. The instruction set in protected mode is similar to that used in real mode. However, certain constraints that apply to real mode (such as not being able to use ax,cx,dx in addressing[citation needed]) do not apply in protected mode. Conversely, segment arithmetic, a common practice in real mode code, is not allowed in protected mode.

x86 registers

For a description of the general notion of a CPU register, see Processor register.

16-bit

The original Intel 8086 and 8088 have fourteen 16-bit registers. Four of them (AX, BX, CX, DX) are general-purpose registers (GPRs), although each may have an additional purpose; for example, only CX can be used as a counter with the loop instruction. Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, and BP (base pointer) is often used to point at some other place in the stack, typically above the local variables (see frame pointer). The registers SI, DI, BX and BP are address registers, and may also be used for array indexing.

Four segment registers (CS, DS, SS and ES) are used to form a memory address. The FLAGS register contains flags such as carry flag, overflow flag and zero flag. Finally, the instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; this register cannot be directly accessed (read or written) by a program.[19]

The Intel 80186 and 80188 are essentially an upgraded 8086 or 8088 CPU, respectively, with on-chip peripherals added, and they have the same CPU registers as the 8086 and 8088 (in addition to interface registers for the peripherals).

The 8086, 8088, 80186, and 80188 can use an optional floating-point coprocessor, the 8087. The 8087 appears to the programmer as part of the CPU and adds eight 80-bit wide registers, st(0) to st(7), each of which can hold numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-bit (binary) integer, and 80-bit packed decimal integer.[7]:S-6, S-13..S-15

In the Intel 80286, to support protected mode, three special registers hold descriptor table addresses (GDTR, LDTR, IDTR), and a fourth task register (TR) is used for task switching. The 80287 is the floating-point coprocessor for the 80286 and has the same registers as the 8087 with the same data formats.

32-bit

With the advent of the 32-bit 80386 processor, the 16-bit general-purpose registers, base registers, index registers, instruction pointer, and FLAGS register, but not the segment registers, were expanded to 32 bits. The nomenclature（命名法） represented this by prefixing an "E" (for "extended") to the register names in x86 assembly language. Thus, the AX register corresponds to the lowest 16 bits of the new 32-bit EAX register, SI corresponds to the lowest 16 bits of ESI, and so on. The general-purpose registers, base registers, and index registers can all be used as the base in addressing modes, and all of those registers except for the stack pointer can be used as the index in addressing modes.

Two new segment registers (FS and GS) were added. With a greater number of registers, instructions and operands, the machine code format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16-bit or 32-bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa.

The 80386 had an optional floating-point coprocessor, the 80387; it had eight 80-bit wide registers: st(0) to st(7),[20] like the 8087 and 80287. The 80386 could also use an 80287 coprocessor.[21] With the 80486 and all subsequent x86 models, the floating-point processing unit (FPU) is integrated on-chip.

The Pentium MMX added eight 64-bit MMX integer registers (MMX0 to MMX7, which share lower bits with the 80-bit-wide FPU stack).[22] With the Pentium III, Intel added a 32-bit Streaming SIMD Extensions (SSE) control/status register (MXCSR) and eight 128-bit SSE floating point registers (XMM0 to XMM7).[23]

64-bit

Starting with the AMD Opteron processor, the x86 architecture extended the 32-bit registers into 64-bit registers in a way similar to how the 16 to 32-bit extension took place. An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), and eight additional 64-bit general registers (R8-R15) were also introduced in the creation of x86-64. However, these extensions are only usable in 64-bit mode, which is one of the two modes only available in long mode. The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, virtual addresses are now sign extended to 64 bits (in order to disallow mode bits in virtual addresses), and other selector details were dramatically reduced. In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems.