A CPU cache is a hardware cache used by the central processing unit of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from used main memory locations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is organized as a hierarchy of more cache levels. All modern CPUs have multiple levels of CPU caches; the first CPUs that used a cache had only one level of cache. All current CPUs with caches have a split L1 cache, they have L2 caches and, for larger processors, L3 caches as well. The L2 cache is not split and acts as a common repository for the split L1 cache; every core of a multi-core processor has a dedicated L2 cache and is not shared between the cores. The L3 cache, higher-level caches, are shared between the cores and are not split. An L4 cache is uncommon, is on dynamic random-access memory, rather than on static random-access memory, on a separate die or chip.
That was the case with L1, while bigger chips have allowed integration of it and all cache levels, with the possible exception of the last level. Each extra level of cache tends to be optimized differently. Other types of caches exist, such as the translation lookaside buffer, part of the memory management unit that most CPUs have. Caches are sized in powers of two: 4, 8, 16 etc. KiB or MiB sizes; when trying to read from or write to a location in main memory, the processor checks whether the data from that location is in the cache. If so, the processor will read from or write to the cache instead of main memory, much slower. Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, a translation lookaside buffer used to speed up virtual-to-physical address translation for both executable instructions and data. A single TLB can be provided for access to both instructions and data, or a separate Instruction TLB and data TLB can be provided.
The data cache is organized as a hierarchy of more cache levels. However, the TLB cache is part of the memory management unit and not directly related to the CPU caches. Data is transferred between memory and cache in blocks of fixed size, called cache lines or cache blocks; when a cache line is copied from memory into the cache, a cache entry is created. The cache entry will include the copied data as well as the requested memory location; when the processor needs to read or write a location in memory, it first checks for a corresponding entry in the cache. The cache checks for the contents of the requested memory location in any cache lines that might contain that address. If the processor finds that the memory location is in the cache, a cache hit has occurred. However, if the processor does not find the memory location in the cache, a cache miss. In the case of a cache hit, the processor reads or writes the data in the cache line. For a cache miss, the cache allocates a new entry and copies data from main memory the request is fulfilled from the contents of the cache.
To make room for the new entry on a cache miss, the cache may have to evict one of the existing entries. The heuristic it uses to choose the entry to evict is called the replacement policy; the fundamental problem with any replacement policy is that it must predict which existing cache entry is least to be used in the future. Predicting the future is difficult, so there is no perfect method to choose among the variety of replacement policies available. One popular replacement policy, least-recently used, replaces the least accessed entry. Marking some memory ranges as non-cacheable can improve performance, by avoiding caching of memory regions that are re-accessed; this avoids the overhead of loading something into the cache without having any reuse. Cache entries may be disabled or locked depending on the context. If data is written to the cache, at some point it must be written to main memory. In a write-through cache, every write to the cache causes a write to main memory. Alternatively, in a write-back or copy-back cache, writes are not mirrored to the main memory, the cache instead tracks which locations have been written over, marking them as dirty.
The data in these locations is written back to the main memory only when that data is evicted from the cache. For this reason, a read miss in a write-back cache may sometimes require two memory accesses to service: one to first write the dirty location to main memory, another to read the new location from memory. A write to a main memory location, not yet mapped in a write-back cache may evict an dirty location, thereby freeing that cache space for the new memory location. There are intermediate policies as well; the cache may be write-through, but the writes may be held in a store data queue temporarily so multiple stores can be processed together. Cached data from the main memory may be changed b
X87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that worked in tandem with corresponding x86 CPUs; these microchips had names ending in "87". This was known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can; the x87 instruction set includes instructions for basic floating-point operations such as addition and comparison, but for more complex numerical operations, such as the computation of the tangent function and its inverse, for example. Most x86 processors since the Intel 80486 have had these x87 instructions implemented in the main CPU, but the term is sometimes still used to refer to that part of the instruction set.
Before x87 instructions were standard in PCs, compilers or programmers had to use rather slow library calls to perform floating-point operations, a method, still common in embedded systems. The x87 registers form an 8-level deep non-strict stack structure ranging from ST to ST with registers that can be directly accessed by either operand, using an offset relative to the top, as well as pushed and popped. There are instructions to push and pop values on top of this stack; the non-strict stack model allows binary operations to use ST together with a direct memory operand or with an explicitly specified stack register, ST, in a role similar to a traditional accumulator. This can be reversed on an instruction-by-instruction basis with ST as the unmodified operand and ST as the destination. Furthermore, the contents in ST can be exchanged with another stack register using an instruction called FXCH ST; these properties make the x87 stack usable as seven addressable registers plus a dedicated accumulator.
This is applicable on superscalar x86 processors, where these exchange instructions are optimized down to a zero clock penalty by using one of the integer paths for FXCH ST in parallel with the FPU instruction. Despite being natural and convenient for human assembly language programmers, some compiler writers have found it complicated to construct automatic code generators that schedule x87 code effectively; such a stack-based interface can minimize the need to save scratch variables in function calls compared with a register-based interface The x87 provides single-precision, double-precision and 80-bit double-extended precision binary floating-point arithmetic as per the IEEE 754-1985 standard. By default, the x87 processors all use 80-bit double-extended precision internally. A given sequence of arithmetic operations may thus behave differently compared to a strict single-precision or double-precision IEEE 754 FPU; as this may sometimes be problematic for some semi-numerical calculations written to assume double precision for correct operation, to avoid such problems, the x87 can be configured using a special configuration/status register to automatically round to single or double precision after each operation.
Since the introduction of SSE2, the x87 instructions are not as essential as they once were, but remain important as a high-precision scalar unit for numerical calculations sensitive to round-off error and requiring the 64-bit mantissa precision and extended range available in the 80-bit format. Clock cycle counts for examples of typical x87 FPU instructions; the A... B notation covers timing variations dependent on transient pipeline status and the arithmetic precision chosen; the L → H notation depicts values corresponding to the lowest and the highest maximal clock frequencies that were available. * An effective zero clock delay is possible, via superscalar execution. § The 5 MHz 8087 was the original x87 processor. Compared to typical software-implemented floating-point routines on an 8086, the factors would be larger by another factor of 10. Companies that have designed or manufactured floating-point units compatible with the Intel 8087 or models include AMD, Chips and Technologies, Fujitsu, Harris Semiconductor, IBM, IDT, IIT, LC Technology, National Semiconductor, NexGen, Rise Technology, ST Microelectronics, Texas Instruments, Transmeta, ULSI (the Math·Co copr
The first Pentium microprocessor was introduced by Intel on March 22, 1993. Dubbed P5, its microarchitecture was the fifth generation for Intel, the first superscalar IA-32 microarchitecture; as a direct extension of the 80486 architecture, it included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches and features for further reduced address calculation latency. In 1996, the Pentium with MMX Technology was introduced with the same basic microarchitecture complemented with an MMX instruction set, larger caches, some other enhancements; the P5 Pentium competitors included the Motorola 68060 and the PowerPC 601 as well as the SPARC, MIPS, Alpha microprocessor families, most of which used a superscalar in-order dual instruction pipeline configuration at some time. Intel's Larrabee multicore architecture project uses a processor core derived from a P5 core, augmented by multithreading, 64-bit instructions, a 16-wide vector processing unit. Intel's low-powered Bonnell microarchitecture employed in early Atom processor cores uses an in-order dual pipeline similar to P5.
Intel discontinued the P5 Pentium processors in 1999 in favor of the Celeron processor which replaced the 80486 brand. The P5 microarchitecture was designed by the same Santa Clara team which designed the 386 and 486. Design work started in 1989; the preliminary design was first simulated in 1990, followed by the laying-out of the design. By this time, the team had several dozen engineers; the design was taped out, or transferred to silicon, in April 1992, at which point beta-testing began. By mid-1992, the P5 team had 200 engineers. Intel at first planned to demonstrate the P5 in June 1992 at the trade show PC Expo, to formally announce the processor in September 1992, but design problems forced the demo to be cancelled, the official introduction of the chip was delayed until the spring of 1993. John H. Crawford, chief architect of the original 386, co-managed the design of the P5, along with Donald Alpert, who managed the architectural team. Dror Avnon managed the design of the FPU. Vinod K. Dham was general manager of the P5 group.
The P5 microarchitecture brings several important advancements over the preceding i486 architecture. Performance: Superscalar architecture — The Pentium has two datapaths that allow it to complete two instructions per clock cycle in many cases; the main pipe can handle any instruction, while the other can handle the most common simple instructions. Some RISC proponents had argued that the "complicated" x86 instruction set would never be implemented by a pipelined microarchitecture, much less by a dual-pipeline design; the 486 and the Pentium demonstrated that this was indeed feasible. 64-bit external databus doubles the amount of information possible to read or write on each memory access and therefore allows the Pentium to load its code cache faster than the 80486. Separation of code and data caches lessens the fetch and operand read/write conflicts compared to the 486. To reduce access time and implementation cost, both of them are 2-way associative, instead of the single 4-way cache of the 486.
A related enhancement in the Pentium is the ability to read a contiguous block from the code cache when it is split between two cache lines. Much faster floating-point unit; some instructions showed an enormous improvement, most notably FMUL, with up to 15 times higher throughput than in the 80486 FPU. The Pentium is able to execute a FXCH ST instruction in parallel with an ordinary FPU instruction. Four-input address adders enables the Pentium to further reduce the address calculation latency compared to the 80486; the Pentium can calculate full addressing modes with segment-base + base-register + scaled register + immediate offset in a single cycle. The microcode can employ both pipelines to enable auto-repeating instructions such as REP MOVSW perform one iteration every clock cycle, while the 80486 needed three clocks per iteration. Optimization of the access to the first microcode words during the decode stages helps in making several frequent instructions execute more especially in their most common forms and in typical cases.
Some examples are: CALL, RET, shifts/rotates. A faster hardware-based multiplier makes instructions such as MUL and IMUL several times faster than in the 80486. Virtualized interrupt to speed up virtual 8086 mode. Other features: Enhanced debug features with the introduction of the Processor-based debug port. Enhanced self-test features like the L1 cache parity check. New instructions: CPUID, CMPXCHG8B, RDTSC, RDMSR, WRMSR, RSM. Test registers TR0–TR7 and MOV instructions for access to them were eliminated; the Pentium MMX added the MMX instruction set, a basic integer SIMD instruction set extension marketed for use in multimedia applications. MMX could not be used with the x87 FPU instructions because the registers were
In computing, floating-point arithmetic is arithmetic using formulaic representation of real numbers as an approximation so as to support a trade-off between range and precision. For this reason, floating-point computation is found in systems which include small and large real numbers, which require fast processing times. A number is, in general, represented to a fixed number of significant digits and scaled using an exponent in some fixed base. A number that can be represented is of the following form: significand × base exponent, where significand is an integer, base is an integer greater than or equal to two, exponent is an integer. For example: 1.2345 = 12345 ⏟ significand × 10 ⏟ base − 4 ⏞ exponent. The term floating point refers to the fact that a number's radix point can "float"; this position is indicated as the exponent component, thus the floating-point representation can be thought of as a kind of scientific notation. A floating-point system can be used to represent, with a fixed number of digits, numbers of different orders of magnitude: e.g. the distance between galaxies or the diameter of an atomic nucleus can be expressed with the same unit of length.
The result of this dynamic range is that the numbers that can be represented are not uniformly spaced. Over the years, a variety of floating-point representations have been used in computers. In 1985, the IEEE 754 Standard for Floating-Point Arithmetic was established, since the 1990s, the most encountered representations are those defined by the IEEE; the speed of floating-point operations measured in terms of FLOPS, is an important characteristic of a computer system for applications that involve intensive mathematical calculations. A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. A number representation specifies some way of encoding a number as a string of digits. There are several mechanisms. In common mathematical notation, the digit string can be of any length, the location of the radix point is indicated by placing an explicit "point" character there. If the radix point is not specified the string implicitly represents an integer and the unstated radix point would be off the right-hand end of the string, next to the least significant digit.
In fixed-point systems, a position in the string is specified for the radix point. So a fixed-point scheme might be to use a string of 8 decimal digits with the decimal point in the middle, whereby "00012345" would represent 0001.2345. In scientific notation, the given number is scaled by a power of 10, so that it lies within a certain range—typically between 1 and 10, with the radix point appearing after the first digit; the scaling factor, as a power of ten, is indicated separately at the end of the number. For example, the orbital period of Jupiter's moon Io is 152,853.5047 seconds, a value that would be represented in standard-form scientific notation as 1.528535047×105 seconds. Floating-point representation is similar in concept to scientific notation. Logically, a floating-point number consists of: A signed digit string of a given length in a given base; this digit string is referred to mantissa, or coefficient. The length of the significand determines the precision; the radix point position is assumed always to be somewhere within the significand—often just after or just before the most significant digit, or to the right of the rightmost digit.
This article follows the convention that the radix point is set just after the most significant digit. A signed integer exponent. To derive the value of the floating-point number, the significand is multiplied by the base raised to the power of the exponent, equivalent to shifting the radix point from its implied position by a number of places equal to the value of the exponent—to the right if the exponent is positive or to the left if the exponent is negative. Using base-10 as an example, the number 152,853.5047, which has ten decimal digits of precision, is represented as the significand 1,528,535,047 together with 5 as the exponent. To determine the actual value, a decimal point is placed after the first digit of the significand and the result is multiplied by 105 to give 1.528535047×105, or 152,853.5047. In storing such a number, the base need not be stored, since it will be the same for the entire range of supported numbers, can thus be inferred. Symbolically, this final value is: s b p − 1 × b e, where s is the
A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, multiplication, square root, bitshifting; some systems can perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines. In general purpose computer architectures, one or more FPUs may be integrated as execution units within the central processing unit; when a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out: A floating-point unit emulator Add-on FPU Integrated FPU Historically systems implemented floating point via a coprocessor rather than as an integrated unit. This could be an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating point calculations are done in software, which takes more processor time but which avoids the cost of the extra hardware.
For a particular computer architecture, the floating point unit instructions may be emulated by a library of software functions. Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user space code; when only integer functionality is available the CORDIC floating point emulation methods are most used. In most modern computer architectures, there is some division of floating-point operations from integer operations; this division varies by architecture. CORDIC routines has been implemented in the Intel 8087, 80287, 80387 up to the 80486 coprocessor series as well as in the Motorola 68881 and 68882 for some kinds of floating-point instructions as a way to reduce the gate counts of the FPU sub-system. Floating-point operations are pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations. Since the early 1990s, many microprocessors for desktops and servers have more than one FPU.
The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core; some floating-point hardware only supports the simplest operations – addition and multiplication. But the most complex floating-point hardware has a finite number of operations it can support – for example, none of them directly support arbitrary-precision arithmetic; when a CPU is executing a program that calls for a floating-point operation, not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit; the software that lists the necessary series of operations to emulate floating-point operations is packaged in a floating-point library.
In some cases, FPUs may be specialized, divided between simpler floating-point operations and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode, while the more complex operations are implemented as software. In some current architectures, the FPU functionality is combined with units to perform SIMD computation. In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be separate from the CPU, sold as an optional add-on, it would only be purchased if needed to enable math-intensive programs. The IBM PC, XT, most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor; the AT and 80286-based systems were socketed for the 80287, 80386/80386SX based machines for the 80387 and 80387SX although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series; these included Weitek. Coprocessors were available for the Motorola 68000 family, the 68881 and 68882.
These were common in Motorola 68020/68030-based workstations like the Sun 3 series. They were commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower end systems. There are add-on FPUs coprocessor units for microcontroller units /single-board computer, which serve to provide floating-point arithmetic capability; these add-on FPUs are host-processor-independent, possess their own programming requirements and are pro
The Intel 8085 is an 8-bit microprocessor produced by Intel and introduced in 1976. It is a software-binary compatible with the more-famous Intel 8080 with only two minor instructions added to support its added interrupt and serial input/output features. However, it requires less support circuitry, allowing simpler and less expensive microcomputer systems to be built; the "5" in the part number highlighted the fact that the 8085 uses a single +5-volt power supply by using depletion-mode transistors, rather than requiring the +5 V, −5 V and +12 V supplies needed by the 8080. This capability matched that of the competing Z80, a popular 8080-derived CPU introduced the year before; these processors could be used in computers running the CP/M operating system. The 8085 is supplied in a 40-pin DIP package. To maximise the functions on the available pins, the 8085 uses a multiplexed address/data bus. However, an 8085 circuit requires an 8-bit address latch, so Intel manufactured several support chips with an address latch built in.
These include the 8755, with an address latch, 2 KB of EPROM and 16 I/O pins, the 8155 with 256 bytes of RAM, 22 I/O pins and a 14-bit programmable timer/counter. The multiplexed address/data bus reduced the number of PCB tracks between the 8085 and such memory and I/O chips. Both the 8080 and the 8085 were eclipsed by the Zilog Z80 for desktop computers, which took over most of the CP/M computer market, as well as a share of the booming home-computer market in the early-to-mid-1980s; the 8085 had a long life as a controller, no doubt thanks to its built-in serial I/O and 5 prioritized interrupts, arguably microcontroller-like features that the Z80 CPU did not have. Once designed into such products as the DECtape II controller and the VT102 video terminal in the late 1970s, the 8085 served for new production throughout the lifetime of those products; this was longer than the product life of desktop computers. The 8085 is a conventional von Neumann design based on the Intel 8080. Unlike the 8080 it does not multiplex state signals onto the data bus, but the 8-bit data bus is instead multiplexed with the lower 8-bits of the 16-bit address bus to limit the number of pins to 40.
State signals are provided by dedicated bus control signal pins and two dedicated bus state ID pins named S0 and S1. Pin 40 is pin 20 for ground. Pin 39 is used as the Hold pin; the processor was designed using nMOS circuitry, the "H" versions were implemented in Intel's enhanced nMOS process called HMOS developed for fast static RAM products. Only a single 5 volt power supply is needed, like competing processors and unlike the 8080; the 8085 uses 6,500 transistors. The 8085 incorporates the functions of the 8224 and the 8228 on chip, increasing the level of integration. A downside compared to similar contemporary designs is the fact that the buses require demultiplexing; the 8085 has extensions to support new interrupts, with three maskable vectored interrupts, one non-maskable interrupt, one externally serviced interrupt. Each of these five interrupts has a separate pin on the processor, a feature which permits simple systems to avoid the cost of a separate interrupt controller; the RST 7.5 interrupt is edge triggered, while 6.5 are level-sensitive.
All interrupts are disabled by the DI instruction. In addition, the SIM and RIM instructions, the only instructions of the 8085 that are not from the 8080 design, allow each of the three maskable RST interrupts to be individually masked. All three are masked after a normal CPU reset. SIM and RIM allow the global interrupt mask state and the three independent RST interrupt mask states to be read, the pending-interrupt states of those same three interrupts to be read, the RST 7.5 trigger-latch flip-flop to be reset, serial data to be sent and received via the SOD and SID pins all under program control and independently of each other. SIM and RIM each execute in 4 clock cycles, making it possible to sample SID and/or toggle SOD faster than it is possible to toggle or sample a signal via any I/O or memory-mapped port, e.g. one of the port of an 8155. Like the 8080, the 8085 can accommodate slower memories through externally generated wait states, has provisions for Direct Memory Access using HOLD and HLDA signals.
An improvement over the 8080 is that the 8085 can itself drive a piezoelectric crystal directly connected to it, a built-in clock generator generates the internal high amplitude two-phase clock signals at half the crystal frequency. The internal clock is available on an output pin, to drive peripheral devices or other CPUs in lock-step synchrony with the CPU from which the signal is output; the 8085 can be clocked by an external oscillator. The 8085 is a binary compatible follow up on the 8080, it supports the complete instruction set of the 8080, with the same
The Intel 8008 is an early byte-oriented microprocessor designed and manufactured by Intel and introduced in April 1972. It is an 8-bit CPU with an external 14-bit address bus. Known as the 1201, the chip was commissioned by Computer Terminal Corporation to implement an instruction set of their design for their Datapoint 2200 programmable terminal; as the chip was delayed and did not meet CTC's performance goals, the 2200 ended up using CTC's own TTL-based CPU instead. An agreement permitted Intel to market the chip to other customers after Seiko expressed an interest in using it for a calculator. CTC formed in San Antonio in 1968 under the direction of Austin O. "Gus" Roche and Phil Ray, both NASA engineers. Roche, in particular, was interested in producing a desktop computer. However, given the immaturity of the market, the company's business plan mentioned only a Teletype Model 33 ASR replacement, which shipped as the Datapoint 3300; the case was deliberately designed to fit in the same space as an IBM Selectric typewriter and used a video screen shaped to have the same aspect ratio as an IBM punched card.
Although commercially successful, the 3300 had ongoing heat problems due to the amount of circuitry packed into such a small space. In order to address the heating and other issues, a re-design started that featured the CPU part of the internal circuitry re-implemented on a single chip. Looking for a company able to produce their chip design, Roche turned to Intel primarily a vendor of memory chips. Roche met with Bob Noyce, he said that if you have a computer chip, you can only sell one chip per computer, while with memory, you can sell hundreds of chips per computer." Another major concern was that Intel's existing customer base purchased their memory chips for use with their own processor designs. Noyce agreed to a $50,000 development contract in early 1970. Texas Instruments was brought in as a second supplier. TI was able to make samples of the 1201 based on Intel drawings, but these proved to be buggy and were rejected. Intel's own versions were delayed. CTC decided to re-implement the new version of the terminal using discrete TTL instead of waiting for a single-chip CPU.
The new system was released as the Datapoint 2200 in the spring 1970, with their first sale to General Mills on May 25, 1970. CTC paused development of the 1201. Six months Seiko approached Intel, expressing an interest in using the 1201 in a scientific calculator after seeing the success of the simpler Intel 4004 used by Busicom in their business calculators. A small re-design followed, under the leadership of Federico Faggin, the designer of the 4004, now project leader of the 1201, expanding from a 16-pin to 18-pin design, the new 1201 was delivered to CTC in late 1971. By that point, CTC had once again moved on, this time to the Datapoint 2200 II, faster; the 1201 was no longer powerful enough for the new model. CTC voted to end their involvement with the 1201, leaving the design's intellectual property to Intel instead of paying the $50,000 contract. Intel renamed it the 8008 and put it in their catalog in April 1972 priced at $120. Intel's initial worries about their existing customer base leaving them proved unfounded, the 8008 went on to be a commercially successful design.
This was followed by the Intel 8080, the hugely successful Intel x86 family. One of the first teams to build a complete system around the 8008 was Bill Pentz' team at California State University, Sacramento; the Sac State 8008 was the first true microcomputer, with a disk operating system built with IBM Basic assembly language in PROM, all driving a color display, hard drive, modem, audio/paper tape reader and printer. The project started in the spring of 1972, with key help from Tektronix the system was functional a year later. Bill assisted Intel with the MCS-8 kit and provided key input to the Intel 8080 instruction set, which helped make it useful for the industry and hobbyists. In the UK, a team at S. E. Laboratories Engineering led by Tom Spink in 1972 built a microcomputer based on a pre-release sample of the 8008. Joe Hardman extended the chip with an external stack. This, among other things, gave it power-fail recovery. Joe developed a direct screen printer; the operating system was written using a meta-assembler developed by L. Crawford and J. Parnell for a Digital Equipment Corporation PDP-11.
The operating system was burnt into a PROM. It was interrupt-driven and based on a fixed page size for programs and data. An operational prototype was prepared for management; the 8008 was the CPU for the first commercial non-calculator personal computers: the US SCELBI kit and the pre-built French Micral N and Canadian MCM/70. The 8008 was implemented in 10 μm silicon-gate enhancement-mode PMOS logic. Initial versions could work at clock frequencies up to 0.5 MHz. This was increased in the 8008-1 to a specified maximum of 0.8 MHz. Instructions took between 11 T-states, where each T-state was 2 clock cycles. Register–register loads and ALU operations took 5T, register–memory 8T, while calls and jumps took 11 T-states; the 8008 was a little slower in terms of instructions per second (36,000 to 80,000 at 0.8 MH