X86 is a family of instruction set architectures based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address; the term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Many additions and extensions have been added to the x86 instruction set over the years consistently with full backward compatibility; the architecture has been implemented in processors from Intel, Cyrix, AMD, VIA and many other companies. Of those, only Intel, AMD, VIA hold x86 architectural licenses, are producing modern 64-bit designs; the term is not synonymous with IBM PC compatibility, as this implies a multitude of other computer hardware. As of 2018, the majority of personal computers and laptops sold are based on the x86 architecture, while other categories—especially high-volume mobile categories such as smartphones or tablets—are dominated by ARM.
In the 1980s and early 1990s, when the 8088 and 80286 were still in common use, the term x86 represented any 8086 compatible CPU. Today, however, x86 implies a binary compatibility with the 32-bit instruction set of the 80386; this is due to the fact that this instruction set has become something of a lowest common denominator for many modern operating systems and also because the term became common after the introduction of the 80386 in 1985. A few years after the introduction of the 8086 and 8088, Intel added some complexity to its naming scheme and terminology as the "iAPX" of the ambitious but ill-fated Intel iAPX 432 processor was tried on the more successful 8086 family of chips, applied as a kind of system-level prefix. An 8086 system, including coprocessors such as 8087 and 8089, as well as simpler Intel-specific system chips, was thereby described as an iAPX 86 system. There were terms iRMX, iSBC, iSBX – all together under the heading Microsystem 80. However, this naming scheme was quite temporary.
Although the 8086 was developed for embedded systems and small multi-user or single-user computers as a response to the successful 8080-compatible Zilog Z80, the x86 line soon grew in features and processing power. Today, x86 is ubiquitous in both stationary and portable personal computers, is used in midrange computers, workstations and most new supercomputer clusters of the TOP500 list. A large amount of software, including a large list of x86 operating systems are using x86-based hardware. Modern x86 is uncommon in embedded systems and small low power applications as well as low-cost microprocessor markets, such as home appliances and toys, lack any significant x86 presence. Simple 8-bit and 16-bit based architectures are common here, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo and Intel Atom are examples of 32- and 64-bit designs used in some low power and low cost segments. There have been several attempts, including by Intel itself, to end the market dominance of the "inelegant" x86 architecture designed directly from the first simple 8-bit microprocessors.
Examples of this are the iAPX 432, the Intel 960, Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitectures and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD's 64-bit extension of x86 and the scalability of x86 chips such as the eight-core Intel Xeon and 12-core AMD Opteron is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from new architectures; the table below lists processor models and model series implementing variations of the x86 instruction set, in chronological order. Each line item is characterized by improved or commercially successful processor microarchitecture designs. At various times, companies such as IBM, NEC, AMD, TI, STM, Fujitsu, OKI, Cyrix, Intersil, C&T, NexGen, UMC, DM&P started to design or manufacture x86 processors intended for personal computers as well as embedded systems; such x86 implementations are simple copies but employ different internal microarchitectures as well as different solutions at the electronic and physical levels.
Quite early compatible microprocessors were 16-bit, while 32-bit designs were developed much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors named to Intel's original chips. Other companies, which designed or manufactured x86 or x87 processors, include ITT Corporation, National Semiconductor, ULSI System Technology, Weitek. Following the pipelined i486, Intel introduced the Pentium brand name for their new set of superscalar x86 designs.
In computing, floating-point arithmetic is arithmetic using formulaic representation of real numbers as an approximation so as to support a trade-off between range and precision. For this reason, floating-point computation is found in systems which include small and large real numbers, which require fast processing times. A number is, in general, represented to a fixed number of significant digits and scaled using an exponent in some fixed base. A number that can be represented is of the following form: significand × base exponent, where significand is an integer, base is an integer greater than or equal to two, exponent is an integer. For example: 1.2345 = 12345 ⏟ significand × 10 ⏟ base − 4 ⏞ exponent. The term floating point refers to the fact that a number's radix point can "float"; this position is indicated as the exponent component, thus the floating-point representation can be thought of as a kind of scientific notation. A floating-point system can be used to represent, with a fixed number of digits, numbers of different orders of magnitude: e.g. the distance between galaxies or the diameter of an atomic nucleus can be expressed with the same unit of length.
The result of this dynamic range is that the numbers that can be represented are not uniformly spaced. Over the years, a variety of floating-point representations have been used in computers. In 1985, the IEEE 754 Standard for Floating-Point Arithmetic was established, since the 1990s, the most encountered representations are those defined by the IEEE; the speed of floating-point operations measured in terms of FLOPS, is an important characteristic of a computer system for applications that involve intensive mathematical calculations. A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. A number representation specifies some way of encoding a number as a string of digits. There are several mechanisms. In common mathematical notation, the digit string can be of any length, the location of the radix point is indicated by placing an explicit "point" character there. If the radix point is not specified the string implicitly represents an integer and the unstated radix point would be off the right-hand end of the string, next to the least significant digit.
In fixed-point systems, a position in the string is specified for the radix point. So a fixed-point scheme might be to use a string of 8 decimal digits with the decimal point in the middle, whereby "00012345" would represent 0001.2345. In scientific notation, the given number is scaled by a power of 10, so that it lies within a certain range—typically between 1 and 10, with the radix point appearing after the first digit; the scaling factor, as a power of ten, is indicated separately at the end of the number. For example, the orbital period of Jupiter's moon Io is 152,853.5047 seconds, a value that would be represented in standard-form scientific notation as 1.528535047×105 seconds. Floating-point representation is similar in concept to scientific notation. Logically, a floating-point number consists of: A signed digit string of a given length in a given base; this digit string is referred to mantissa, or coefficient. The length of the significand determines the precision; the radix point position is assumed always to be somewhere within the significand—often just after or just before the most significant digit, or to the right of the rightmost digit.
This article follows the convention that the radix point is set just after the most significant digit. A signed integer exponent. To derive the value of the floating-point number, the significand is multiplied by the base raised to the power of the exponent, equivalent to shifting the radix point from its implied position by a number of places equal to the value of the exponent—to the right if the exponent is positive or to the left if the exponent is negative. Using base-10 as an example, the number 152,853.5047, which has ten decimal digits of precision, is represented as the significand 1,528,535,047 together with 5 as the exponent. To determine the actual value, a decimal point is placed after the first digit of the significand and the result is multiplied by 105 to give 1.528535047×105, or 152,853.5047. In storing such a number, the base need not be stored, since it will be the same for the entire range of supported numbers, can thus be inferred. Symbolically, this final value is: s b p − 1 × b e, where s is the
The first Pentium microprocessor was introduced by Intel on March 22, 1993. Dubbed P5, its microarchitecture was the fifth generation for Intel, the first superscalar IA-32 microarchitecture; as a direct extension of the 80486 architecture, it included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches and features for further reduced address calculation latency. In 1996, the Pentium with MMX Technology was introduced with the same basic microarchitecture complemented with an MMX instruction set, larger caches, some other enhancements; the P5 Pentium competitors included the Motorola 68060 and the PowerPC 601 as well as the SPARC, MIPS, Alpha microprocessor families, most of which used a superscalar in-order dual instruction pipeline configuration at some time. Intel's Larrabee multicore architecture project uses a processor core derived from a P5 core, augmented by multithreading, 64-bit instructions, a 16-wide vector processing unit. Intel's low-powered Bonnell microarchitecture employed in early Atom processor cores uses an in-order dual pipeline similar to P5.
Intel discontinued the P5 Pentium processors in 1999 in favor of the Celeron processor which replaced the 80486 brand. The P5 microarchitecture was designed by the same Santa Clara team which designed the 386 and 486. Design work started in 1989; the preliminary design was first simulated in 1990, followed by the laying-out of the design. By this time, the team had several dozen engineers; the design was taped out, or transferred to silicon, in April 1992, at which point beta-testing began. By mid-1992, the P5 team had 200 engineers. Intel at first planned to demonstrate the P5 in June 1992 at the trade show PC Expo, to formally announce the processor in September 1992, but design problems forced the demo to be cancelled, the official introduction of the chip was delayed until the spring of 1993. John H. Crawford, chief architect of the original 386, co-managed the design of the P5, along with Donald Alpert, who managed the architectural team. Dror Avnon managed the design of the FPU. Vinod K. Dham was general manager of the P5 group.
The P5 microarchitecture brings several important advancements over the preceding i486 architecture. Performance: Superscalar architecture — The Pentium has two datapaths that allow it to complete two instructions per clock cycle in many cases; the main pipe can handle any instruction, while the other can handle the most common simple instructions. Some RISC proponents had argued that the "complicated" x86 instruction set would never be implemented by a pipelined microarchitecture, much less by a dual-pipeline design; the 486 and the Pentium demonstrated that this was indeed feasible. 64-bit external databus doubles the amount of information possible to read or write on each memory access and therefore allows the Pentium to load its code cache faster than the 80486. Separation of code and data caches lessens the fetch and operand read/write conflicts compared to the 486. To reduce access time and implementation cost, both of them are 2-way associative, instead of the single 4-way cache of the 486.
A related enhancement in the Pentium is the ability to read a contiguous block from the code cache when it is split between two cache lines. Much faster floating-point unit; some instructions showed an enormous improvement, most notably FMUL, with up to 15 times higher throughput than in the 80486 FPU. The Pentium is able to execute a FXCH ST instruction in parallel with an ordinary FPU instruction. Four-input address adders enables the Pentium to further reduce the address calculation latency compared to the 80486; the Pentium can calculate full addressing modes with segment-base + base-register + scaled register + immediate offset in a single cycle. The microcode can employ both pipelines to enable auto-repeating instructions such as REP MOVSW perform one iteration every clock cycle, while the 80486 needed three clocks per iteration. Optimization of the access to the first microcode words during the decode stages helps in making several frequent instructions execute more especially in their most common forms and in typical cases.
Some examples are: CALL, RET, shifts/rotates. A faster hardware-based multiplier makes instructions such as MUL and IMUL several times faster than in the 80486. Virtualized interrupt to speed up virtual 8086 mode. Other features: Enhanced debug features with the introduction of the Processor-based debug port. Enhanced self-test features like the L1 cache parity check. New instructions: CPUID, CMPXCHG8B, RDTSC, RDMSR, WRMSR, RSM. Test registers TR0–TR7 and MOV instructions for access to them were eliminated; the Pentium MMX added the MMX instruction set, a basic integer SIMD instruction set extension marketed for use in multimedia applications. MMX could not be used with the x87 FPU instructions because the registers were
The K5 is AMD's first x86 processor to be developed in-house. Introduced in March 1996, its primary competition was Intel's Pentium microprocessor; the K5 was an ambitious design, closer to a Pentium Pro than a Pentium regarding technical solutions and internal architecture. However, the final product was closer to the Pentium regarding performance, although faster clock-for-clock compared to the Pentium; the K5 was based upon an internal parallel 29k RISC processor architecture with an x86 decoding front-end. The K5 offered good x86 compatibility and the in-house-developed test suite proved invaluable on projects. All models had 4.3 million transistors, with five integer units that could process instructions out of order and one floating-point unit. The branch target buffer was four times the size of the Pentium's and register renaming helped overcome register dependencies; the chip's speculative execution of instructions reduced pipeline stalls. It had an 8 KB data cache; the floating-point divide and square-root microcode were mechanically proven.
The floating-point transcendental instructions were implemented in hardware and were faithful to true mathematical results for all operands. The K5 project represented an early chance for AMD to take technical leadership from Intel. Although the chip addressed the right design concepts, the actual engineering implementation had its issues; the low clock rates were, in part, due to AMD's limitations as a "cutting edge" manufacturing company at the time, in part due to the design itself, which had many levels of logic for the process technology of the day, hampering clock scaling. Additionally, while the K5's floating-point performance was regarded as superior to that of the Cyrix 6x86, it was slower than that of the Pentium, although while offering more reliable transcendental function results; because it was late to market and did not meet performance expectations, the K5 never gained the acceptance among large computer manufacturers that the Am486 and AMD K6 enjoyed. There were two sets of K5 processors, internally called the SSA/5 and the 5k86, both released with the K5 label.
The "SSA/5" had its branch-prediction unit disabled and additional internal waitstates added. The "SSA/5" line ran from 75 to 100 MHz. However, AMD used what it called a PR rating, or performance rating, to label the chips according to their equivalence to a Pentium of that clock speed. Thus, a 116 MHz chip from the second line was marketed as the "K5 PR166". Manufacturing delays caused the PR200's arrival to nearly align with the release of K6. Since AMD did not want the two chips competing, the K5-PR200 only arrived in small numbers. Sold as 5K86 P75 to P100 as K5 PR75 to PR100 4.3 million transistors in 500 or 350 nm L1-Cache: 8 + 16 KB Socket 5 and Socket 7 VCore: 3.52 V Front side bus: 50, 60, 66 MHz First release: March 27, 1996 Clockrate: 75, 90, 100 MHz Sold as K5 PR120 to PR166 4.3 million transistors in 350 nm L1-Cache: 8 + 16 KB Socket 5 and Socket 7 VCore: 3.52 V Front side bus: 60, 66 MHz First release: October 7, 1996 Clockrate: 90, 100, 105, 116.6, 133 MHz List of AMD K5 microprocessors Gwennap, Linley.
"AMD Ships Pentium Competitor". Microprocessor Report. Slater, Michael. "AMD's K5 Designed to Outrun Pentium". Microprocessor Report. Slater, Michael. "AMD K5 Volume Slips into 1996". Microprocessor Report. AMD: AMD-K5 Processor Overview Technical overview of the K5 series Pictures of K5 chips at CPUShack.com The AMD K5, a much underrated chip AMD K5 technical specifications
In computer engineering, microarchitecture called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture is implemented in a particular processor. A given ISA may be implemented with different microarchitectures. Computer architecture is the combination of instruction set architecture; the ISA is the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers and data formats among other things; the microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is represented as diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be anything from single gates and registers, to complete arithmetic logic units and larger elements; these diagrams separate the datapath and the control path.
The person designing a system draws the specific microarchitecture as a kind of data flow diagram. Like a block diagram, the microarchitecture diagram shows microarchitectural elements such as the arithmetic and logic unit and the register file as a single schematic symbol; the diagram connects those elements with arrows, thick lines and thin lines to distinguish between three-state buses, unidirectional buses, individual control lines. Simple computers have a single data bus organization – they have a single three-state bus; the diagram of more complex computers shows multiple three-state buses, which help the machine do more operations simultaneously. Each microarchitectural element is in turn represented by a schematic describing the interconnections of logic gates used to implement it; each logic gate is in turn represented by a circuit diagram describing the connections of the transistors used to implement it in some particular logic family. Machines with different microarchitectures may have the same instruction set architecture, thus be capable of executing the same programs.
New microarchitectures and/or circuitry solutions, along with advances in semiconductor manufacturing, are what allows newer generations of processors to achieve higher performance while using the same ISA. In principle, a single microarchitecture could execute several different ISAs with only minor changes to the microcode; the pipelined datapath is the most used datapath design in microarchitecture today. This technique is used in most modern microprocessors, DSPs; the pipelined architecture allows multiple instructions to overlap in execution, much like an assembly line. The pipeline includes several different stages; some of these stages include instruction fetch, instruction decode and write back. Some architectures include other stages such as memory access; the design of pipelines is one of the central microarchitectural tasks. Execution units are essential to microarchitecture. Execution units include arithmetic logic units, floating point units, load/store units, branch prediction, SIMD.
These units perform the calculations of the processor. The choice of the number of execution units, their latency and throughput is a central microarchitectural design task; the size, latency and connectivity of memories within the system are microarchitectural decisions. System-level design decisions such as whether or not to include peripherals, such as memory controllers, can be considered part of the microarchitectural design process; this includes decisions on the connectivity of these peripherals. Unlike architectural design, where achieving a specific performance level is the main goal, microarchitectural design pays closer attention to other constraints. Since microarchitecture design decisions directly affect what goes into a system, attention must be paid to issues such as chip area/cost, power consumption, logic complexity, ease of connectivity, manufacturability, ease of debugging, testability. In general, all CPUs, single-chip microprocessors or multi-chip implementations run programs by performing the following steps: Read an instruction and decode it Find any associated data, needed to process the instruction Process the instruction Write the results outThe instruction cycle is repeated continuously until the power is turned off.
Complicating this simple-looking series of steps is the fact that the memory hierarchy, which includes caching, main memory and non-volatile storage like hard disks, has always been slower than the processor itself. Step introduces a lengthy delay while the data arrives over the computer bus. A considerable amount of research has been put into designs that avoid these delays as much as possible. Over the years, a central goal was to execute more instructions in parallel, thus increasing the effective execution speed of a program; these efforts introduced complicated circuit structures. These techniques could only be implemented on expensive mainframes or supercomputers due to the amount of circuitry needed for these techniques; as semiconductor manufacturing progressed and more of these techniq
Socket 7 is a physical and electrical specification for an x86-style CPU socket on a personal computer motherboard. It was released June 1995; the socket supersedes the earlier Socket 5, accepts P5 Pentium microprocessors manufactured by Intel, as well as compatibles made by Cyrix/IBM, AMD, IDT and others. Socket 7 was the only socket that supported a wide range of CPUs from different manufacturers and a wide range of speeds. Differences between Socket 5 and Socket 7 are that Socket 7 has an extra pin and is designed to provide dual split rail voltage, as opposed to Socket 5's single voltage. Socket 7 is backwards compatible. Processors that used Socket 7 are the AMD K5 and K6, the Cyrix 6x86 and 6x86MX, the IDT WinChip, the Intel P5 Pentium, the Pentium MMX, the Rise Technology mP6. AMD Geode LX and Geode GX used Socket 7 up until 2015. Socket 7 uses a 321-pin SPGA ZIF socket or the rare 296-pin SPGA LIF socket; the size is 1.95" x 1.95". An extension of Socket 7, Super Socket 7, was developed by AMD for their K6-2 and K6-III processors to operate at a higher clock rate and use AGP.
Socket 7 and Socket 8 were replaced by Slot 1 and Slot 2 in 1999. List of Intel microprocessors List of AMD microprocessors This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later
A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions. The instructions are ordinary CPU instructions but the single processor can run multiple instructions on separate cores at the same time, increasing overall speed for programs amenable to parallel computing. Manufacturers integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package; the microprocessors used in all personal computers are multi-core. A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device or loosely. For example, cores may or may not share caches, they may implement message passing or shared-memory inter-core communication methods. Common network topologies to interconnect cores include bus, two-dimensional mesh, crossbar. Homogeneous multi-core systems include only identical cores. Just as with single-processor systems, cores in multi-core systems may implement architectures such as VLIW, vector, or multithreading.
Multi-core processors are used across many application domains, including general-purpose, network, digital signal processing, graphics. The improvement in performance gained by the use of a multi-core processor depends much on the software algorithms used and their implementation. In particular, possible gains are limited by the fraction of the software that can run in parallel on multiple cores. In the best case, so-called embarrassingly parallel problems may realize speedup factors near the number of cores, or more if the problem is split up enough to fit within each core's cache, avoiding use of much slower main-system memory. Most applications, are not accelerated so much unless programmers invest a prohibitive amount of effort in re-factoring the whole problem; the parallelization of software is a significant ongoing topic of research. The terms multi-core and dual-core most refer to some sort of central processing unit, but are sometimes applied to digital signal processors and system on a chip.
The terms are used only to refer to multi-core microprocessors that are manufactured on the same integrated circuit die. This article uses the terms "multi-core" and "dual-core" for CPUs manufactured on the same integrated circuit, unless otherwise noted. In contrast to multi-core systems, the term multi-CPU refers to multiple physically separate processing-units; the terms many-core and massively multi-core are sometimes used to describe multi-core architectures with an high number of cores. Some systems use many soft microprocessor cores placed on a single FPGA; each "core" can be considered a "semiconductor intellectual property core" as well as a CPU core. While manufacturing technology improves, reducing the size of individual gates, physical limits of semiconductor-based microelectronics have become a major design concern; these physical limitations can cause significant heat data synchronization problems. Various other methods are used to improve CPU performance; some instruction-level parallelism methods such as superscalar pipelining are suitable for many applications, but are inefficient for others that contain difficult-to-predict code.
Many applications are better suited to thread-level parallelism methods, multiple independent CPUs are used to increase a system's overall TLP. A combination of increased available space and the demand for increased TLP led to the development of multi-core CPUs. Several business motives drive the development of multi-core architectures. For decades, it was possible to improve performance of a CPU by shrinking the area of the integrated circuit, which reduced the cost per device on the IC. Alternatively, for the same circuit area, more transistors could be used in the design, which increased functionality for complex instruction set computing architectures. Clock rates increased by orders of magnitude in the decades of the late 20th century, from several megahertz in the 1980s to several gigahertz in the early 2000s; as the rate of clock speed improvements slowed, increased use of parallel computing in the form of multi-core processors has been pursued to improve overall processing performance.
Multiple cores were used on the same CPU chip, which could lead to better sales of CPU chips with two or more cores. For example, Intel has produced a 48-core processor for research in cloud computing. Since computer manufacturers have long implemented symmetric multiprocessing designs using discrete CPUs, the issues regarding implementing multi-core processor architecture and supporting it with software are well known. Additionally: Using a proven processing-core design without architectural changes reduces design risk significantly. For general-purpose processors, much of the motivation for multi-core processors comes from diminished gains in processor performance from increasing the operating frequency; this is due to three primary fa