The NEC V20 was a processor made by NEC, a reverse-engineered, pin-compatible version of the Intel 8088 with an instruction set compatible with the Intel 80186. The V20 was introduced in 1982, the V30 debuted in 1983; the chip featured much more than the 29,000 transistors of the simpler 8088 CPU, ran at 5 to 10 MHz and was around 30% faster than the 8088 at the same clock speed due to faster effective address calculation, along with faster loop counters, shift registers and multiplier. NEC V20 was used in "turbo" versions of some PC clones such as Commodore PC compatible systems and Tandy 1000 laptop series, as well as in the Casio PV-S450 PDA and Hewlett-Packard's HP 95LX, the GRiD Model: 1810 laptop computer, the Bandai WonderSwan, a handheld gaming system released in Japan in 1999. Sony produced this microprocessor under license from NEC as the V20H; because it was pin-compatible with the 8088 and inexpensive, the V20 was a popular end-user upgrade for systems with a socketed processor, including the original IBM PC and XT.
An unusual feature of the NEC V20 was that it added an Intel 8080 emulation mode, in which it could execute programs written for the Intel 8080 processors. The instructions BRKEM executed in 8086 mode and RETEM and CALLN executed in 8080 mode was used to switch or return to or from the emulation mode. There were some programs which allowed 8080-based CP/M-80 programs to run on MS-DOS machines, notably V2080 CPMulator by Michael Day and 22nice from SYDEX. Another unusual feature was the existence of several families of unique instructions; the ADD4S, SUB4S, CMP4S instructions were able to add and compare huge packed binary-coded decimal numbers stored in memory. Instructions ROL4 and ROR4 rotate four-bit nibbles. Another family consisted of the TEST1, SET1, CLR1, NOT1 instructions, which test, set and invert single bits of their operands, but are far less efficient than the i80386 equivalents BT, BTS, BTR, BTC. There were two instructions to insert bit fields of arbitrary lengths, and there were two additional repeat prefixes, REPC and REPNC, which amended the original REPE and REPNE instructions and allowed a string of bytes or words to be scanned while a less or not less condition remained true.
The NEC V30 was a version of the NEC V20, pin compatible with the 16-bit data bus Intel 8086 processor. It supports the 8080 emulation mode; the V30 was used in the GTD-5 EAX Class 5 central office as a drop-in performance upgrade for the processor complex in the late 1980s. It was used in the Psion Series 3, in the NEC PC-9801, the Olivetti PCS86, the Applied Engineering "PC Transporter" emulator for the Apple II series of computers, in various arcade machines in the late 1980s; the NEC V20HL and NEC V30HL were low-power versions. The NEC V25 is the microcontroller version of the NEC V20 processor. NEC V25HS V25 version with built-in RX116 RTOS NEC V25+ High-speed version of V25 The NEC V33 is a super version of the V30 that separates address bus and data bus, executes all instructions with wired logic instead of micro-codes, making it twice as fast as a V30 for the same clock frequency. V33 has the performance equivalent to Intel 80286. NEC V33 offers a method to expanding the memory address space to 16M bytes.
It has two additional instructions RETXA to support extended addressing mode. The 8080 emulation mode was not supported; the NEC V33A differs from the NEC V33 in that it has interrupt vector numbers compatible with intel 80X86 processors. The NEC V35 is the microcontroller version of the NEC V30 processor. Has 16-bit external data bus. NEC V35HS V35 version with built-in RX116 RTOS NEC V35+ High-speed version of V35 NEC V40 embedded version of V20, integrated Intel-compatible 8251 USART, 8253 programmable interval timer, 8255 parallel port interface. Used in the Olivetti PC1 and Digisystems Jetta XD. NEC V40HL high-speed low-voltage version of V40 NEC V50 embedded version of V30 with 16-bit data bus, it is the main CPU of the Korg M1. NEC V50HL high-speed low-voltage version of V50 The NEC V41 NEC V51 integrated V30HL core and PC-XT peripherals: 8255 parallel port interface, 8254 programmable interval timer, 8259 PIC, 8237 DMA controller and 8042 keyboard controller. Integrates full DRAM controller.
Was used in Olivetti Quaderno XT-20. The NEC V53 integrates a V33 core with 4-channel DMA, UART, three timer/counters and interrupt controller; the NEC V53A integrates some peripherals with a V33A core. NEC V55PI The Vadem VG230 was a single-chip PC platform; the VG230 contained a 16 MHz NEC V30HL processor and IBM PC/XT-compatible core logic, LCD controller with touch-plane support, keyboard matrix scanner, dual PCMCIA 2.1 card controller, EMS 4.0 hardware support for up to 64 MB, built-in timer, PIC, DMA, UART and RTC controllers. It was used in the IBM Simon; the enhanced Vadem VG330 contained a 32 MHz NEC V30MX processor and IBM PC/AT-compatible core logic with dual PICs, LCD controller, keyboard matrix scanner, PC Card ExCA 2.1 controller and SIR port. Starting with the NEC V60, NEC departed from the x86 design. Die photos NEC
X86-64 is the 64-bit version of the x86 instruction set. It introduces two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supports vastly larger amounts of virtual memory and physical memory than is possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory. X86-64 expands general-purpose registers to 64-bit, as well extends the number of them from 8 to 16, provides numerous other enhancements. Floating point operations are supported via mandatory SSE2-like instructions, x87/MMX style registers are not used. In 64-bit mode, instructions are modified to support 64-bit addressing mode; the compatibility mode allows 16- and 32-bit user applications to run unmodified coexisting with 64-bit applications if the 64-bit operating system supports them. As the full x86 16-bit and 32-bit instruction sets remain implemented in hardware without any intervening emulation, these older executables can run with little or no performance penalty, while newer or modified applications can take advantage of new features of the processor design to achieve performance improvements.
A processor supporting x86-64 still powers on in real mode for full backward compatibility. The original specification, created by AMD and released in 2000, has been implemented by AMD, Intel and VIA; the AMD K8 processor was the first to implement it. This was the first significant addition to the x86 architecture designed by a company other than Intel. Intel was forced to follow suit and introduced a modified NetBurst family, software-compatible with AMD's specification. VIA Technologies introduced x86-64 with the VIA Nano; the x86-64 architecture is distinct from the Intel Itanium architecture, not compatible on the native instruction set level with the x86 architecture. Operating systems and applications written for one cannot be run on the other. AMD64 was created as an alternative to the radically different IA-64 architecture, designed by Intel and Hewlett Packard. Announced in 1999 while a full specification became available in August 2000, the AMD64 architecture was positioned by AMD from the beginning as an evolutionary way to add 64-bit computing capabilities to the existing x86 architecture, as opposed to Intel's approach of creating an new 64-bit architecture with IA-64.
The first AMD64-based processor, the Opteron, was released in April 2003. AMD's processors implementing the AMD64 architecture include Opteron, Athlon 64, Athlon 64 X2, Athlon 64 FX, Athlon II, Turion 64, Turion 64 X2, Phenom, Phenom II, FX, Fusion/APU and Ryzen/Epyc; the primary defining characteristic of AMD64 is the availability of 64-bit general-purpose processor registers, 64-bit integer arithmetic and logical operations, 64-bit virtual addresses. The designers took the opportunity to make other improvements as well; some of the most significant changes are described below. 64-bit integer capability All general-purpose registers are expanded from 32 bits to 64 bits, all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc. can now operate directly on 64-bit integers. Pushes and pops on the stack default to 8-byte strides, pointers are 8 bytes wide. Additional registers In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight in x86 to 16.
It is therefore possible to keep more local variables in registers rather than on the stack, to let registers hold accessed constants. AMD64 still has fewer registers than many RISC instruction sets or VLIW-like machines such as the IA-64. However, an AMD64 implementation may have far more internal registers than the number of architectural registers exposed by the instruction set. Additional XMM registers Similarly, the number of 128-bit XMM registers is increased from 8 to 16; the traditional x87 FPU register stack is not included in the register file size extension in 64-bit mode, compared with the XMM registers used by SSE2, which did get extended. The x87 register stack is not a simple register file although it does allow direct access to individual registers by low cost exchange operations. Larger virtual address space The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations; this allows up to 256 TB of virtual address space.
The architecture definition allows this limit to be raised in future implementations to the full 64 bits, exten
Backward compatibility is a property of a system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system in telecommunications and computing. Backward compatibility is sometimes called downward compatibility. Modifying a system in a way that does not allow backward compatibility is sometimes called "breaking" backward compatibility. A complementary concept is forward compatibility. A design, forward-compatible has a roadmap for compatibility with future standards and products; the associated benefits of backward compatibility are the appeal to an existing user base through an inexpensive upgrade path as well as the network effect, important, as it increases the value of goods and services proportionally to the size of the user base. One example of this is the Sony PlayStation 2, backward compatible with games for its predecessor PlayStation. While the selection of PS2 games available at launch was small, sales of the console were nonetheless strong in 2000-2001 thanks to the large library of games for the preceding PS1.
This bought time for the PS2 to grow a large installed base and developers to release more quality PS2 games for the crucial 2001 holiday season. The associated costs of backward compatibility are a higher bill of materials if hardware is required to support the legacy systems. A notable example is the Sony PlayStation 3, as the first PS3 iteration was expensive to manufacture in part due to including the Emotion Engine from the preceding PS2 in order to run PS2 games, since the PS3 architecture was different from the PS2. Subsequent PS3 hardware revisions have eliminated the Emotion Engine as it saved production costs while removing the ability to run PS2 titles, as Sony found out that backward compatibility was not a major selling point for the PS3. in contrast to the PS2. The PS3's chief competitor, the Microsoft Xbox 360, took a different approach to backward compatibility by using software emulation in order to run games from the first Xbox, rather than including legacy hardware from the original Xbox, quite different than the Xbox 360, however Microsoft stopped releasing emulation profiles after 2007.
A simple example of both backward and forward compatibility is the introduction of FM radio in stereo. FM radio was mono, with only one audio channel represented by one signal. With the introduction of two-channel stereo FM radio, a large number of listeners had only mono FM receivers. Forward compatibility for mono receivers with stereo signals was achieved through sending the sum of both left and right audio channels in one signal and the difference in another signal; that allows mono FM receivers to receive and decode the sum signal while ignoring the difference signal, necessary only for separating the audio channels. Stereo FM receivers can receive a mono signal and decode it without the need for a second signal, they can separate a sum signal to left and right channels if both sum and difference signals are received. Without the requirement for backward compatibility, a simpler method could have been chosen. Full backward compatibility is important in computer instruction set architectures, one of the most successful being the x86 family of microprocessors.
Their full backward compatibility spans back to the 16-bit Intel 8086/8088 processors introduced in 1978. Backwards compatible processors can process the same binary executable software instructions as their predecessors, allowing the use of a newer processor without having to acquire new applications or operating systems; the success of the Wi-Fi digital communication standard is attributed to its broad forward and backward compatibility. Compiler backward compatibility may refer to the ability of a compiler of a newer version of the language to accept programs or data that worked under the previous version. A data format is said to be backward compatible with its predecessor if every message or file, valid under the old format is still valid, retaining its meaning under the new format
A microprocessor is a computer processor that incorporates the functions of a central processing unit on a single integrated circuit, or at most a few integrated circuits. The microprocessor is a multipurpose, clock driven, register based, digital integrated circuit that accepts binary data as input, processes it according to instructions stored in its memory, provides results as output. Microprocessors contain sequential digital logic. Microprocessors operate on symbols represented in the binary number system; the integration of a whole CPU onto a single or a few integrated circuits reduced the cost of processing power. Integrated circuit processors are produced in large numbers by automated processes, resulting in a low unit price. Single-chip processors increase reliability because there are many fewer electrical connections that could fail; as microprocessor designs improve, the cost of manufacturing a chip stays the same according to Rock's law. Before microprocessors, small computers had been built using racks of circuit boards with many medium- and small-scale integrated circuits.
Microprocessors combined this into a few large-scale ICs. Continued increases in microprocessor capacity have since rendered other forms of computers completely obsolete, with one or more microprocessors used in everything from the smallest embedded systems and handheld devices to the largest mainframes and supercomputers; the complexity of an integrated circuit is bounded by physical limitations on the number of transistors that can be put onto one chip, the number of package terminations that can connect the processor to other parts of the system, the number of interconnections it is possible to make on the chip, the heat that the chip can dissipate. Advancing technology makes more powerful chips feasible to manufacture. A minimal hypothetical microprocessor might include only an arithmetic logic unit, a control logic section; the ALU performs addition and operations such as AND or OR. Each operation of the ALU sets one or more flags in a status register, which indicate the results of the last operation.
The control logic retrieves instruction codes from memory and initiates the sequence of operations required for the ALU to carry out the instruction. A single operation code might affect many individual data paths and other elements of the processor; as integrated circuit technology advanced, it was feasible to manufacture more and more complex processors on a single chip. The size of data objects became larger. Additional features were added to the processor architecture. Floating-point arithmetic, for example, was not available on 8-bit microprocessors, but had to be carried out in software. Integration of the floating point unit first as a separate integrated circuit and as part of the same microprocessor chip sped up floating point calculations. Physical limitations of integrated circuits made such practices as a bit slice approach necessary. Instead of processing all of a long word on one integrated circuit, multiple circuits in parallel processed subsets of each data word. While this required extra logic to handle, for example and overflow within each slice, the result was a system that could handle, for example, 32-bit words using integrated circuits with a capacity for only four bits each.
The ability to put large numbers of transistors on one chip makes it feasible to integrate memory on the same die as the processor. This CPU cache has the advantage of faster access than off-chip memory and increases the processing speed of the system for many applications. Processor clock frequency has increased more than external memory speed, so cache memory is necessary if the processor is not delayed by slower external memory. A microprocessor is a general-purpose entity. Several specialized processing devices have followed: A digital signal processor is specialized for signal processing. Graphics processing units are processors designed for realtime rendering of images. Other specialized units exist for video machine vision. Microcontrollers integrate a microprocessor with peripheral devices in embedded systems. Systems on chip integrate one or more microprocessor or microcontroller cores. Microprocessors can be selected for differing applications based on their word size, a measure of their complexity.
Longer word sizes allow each clock cycle of a processor to carry out more computation, but correspond to physically larger integrated circuit dies with higher standby and operating power consumption. 4, 8 or 12 bit processors are integrated into microcontrollers operating embedded systems. Where a system is expected to handle larger volumes of data or require a more flexible user interface, 16, 32 or 64 bit processors are used. An 8- or 16-bit processor may be selected over a 32-bit processor for system on a chip or microcontroller applications that require low-power electronics, or are part of a mixed-signal integrated circuit with noise-sensitive on-chip analog electronics such as high-resolution analog to digital converters, or both. Running 32-bit arithmetic on an 8-bit chip could end up using more power, as the chip must execute software with multiple instructions. Thousands of items that were traditionally not computer-related inc
The Intel 8088 microprocessor is a variant of the Intel 8086. Introduced on July 1, 1979, the 8088 had an eight-bit external data bus instead of the 16-bit bus of the 8086; the 16-bit registers and the one megabyte address range were however. In fact, according to the Intel documentation, the 8086 and 8088 have the same execution unit —only the bus interface unit is different; the original IBM PC was based on the 8088. The 8088 was designed at Intel's laboratory in Haifa, Israel, as were a large number of Intel's processors; the 8088 was targeted at economical systems by allowing the use of an eight-bit data path and eight-bit support and peripheral chips. The prefetch queue of the 8088 was shortened to four bytes, from the 8086's six bytes, the prefetch algorithm was modified to adapt to the narrower bus; these modifications of the basic 8086 design were one of the first jobs assigned to Intel's then-new design office and laboratory in Haifa. Variants of the 8088 with more than 5 MHz maximal clock frequency include the 8088-2, fabricated using Intel's new enhanced nMOS process called HMOS and specified for a maximal frequency of 8 MHz.
Followed the 80C88, a static CHMOS design, which could operate with clock speeds from 0 to 8 MHz. There were several other, more or less similar, variants from other manufacturers. For instance, the NEC V20 was a pin-compatible and faster variant of the 8088, designed and manufactured by NEC. Successive NEC 8088 compatible processors would run at up to 16 MHz. In 1984, Commodore International signed a deal to manufacture the 8088 for use in a licensed Dynalogic Hyperion clone, in a move, regarded as signaling a major new direction for the company; when announced, the list price of the 8088 was US$124.80. The 8088 is architecturally similar to the 8086; the main difference is. All of the other pins of the device perform the same function as they do with the 8086 with two exceptions. First, pin 34 is no longer BHE. Instead it outputs a maximum mode status, SSO. Combined with the IO/M and DT/R signals, the bus cycles can be decoded; the second change is the pin that signals whether a memory access or input/output access is being made has had it sense reversed.
The pin on the 8088 is IO/M. On the 8086 part it is IO/M; the reason for the reversal is that it makes the 8088 compatible with the 8085. Depending on the clock frequency, the number of memory wait states, as well as on the characteristics of the particular application program, the average performance for the Intel 8088 ranged from 0.33 to 1 million instructions per second. Meanwhile, the mov reg,reg and ALU reg,reg instructions, taking two and three cycles yielded an absolute peak performance of between 1⁄3 and 1⁄2 MIPS per MHz, that is, somewhere in the range 3–5 MIPS at 10 MHz; the speed of the execution unit and the bus of the 8086 CPU was well balanced. Cutting down the bus to eight bits made it a serious bottleneck in the 8088. With the speed of instruction fetch reduced by 50% in the 8088 as compared to the 8086, a sequence of fast instructions can drain the four-byte prefetch queue; when the queue is empty, instructions take as long to complete. Both the 8086 and 8088 take four clock cycles to complete a bus cycle.
Therefore, for example, a two-byte shift or rotate instruction, which takes the EU only two clock cycles to execute takes eight clock cycles to complete if it is not in the prefetch queue. A sequence of such fast instructions prevents the queue from being filled as fast as it is drained, in general, because so many basic instructions execute in fewer than four clocks per instruction byte—including all the ALU and data-movement instructions on register operands and some of these on memory operands—it is impossible to avoid idling the EU in the 8088 at least ¼ of the time while executing useful real-world programs, it is not hard to idle it half the time. In short, an 8088 runs about half as fast as 8086 clocked at the same rate, because of the bus bottleneck. A side effect of the 8088 design, with the slow bus and the small prefetch queue, is that the speed of code execution can be dependent on instruction order; when programming the 8088, for CPU efficiency, it is vital to interleave long-running instructions with short ones whenever possible.
For example, a repeated string operation or a shift by three or more will take long enough to allow time for the 4-byte prefetch queue to fill. If short instructions are placed between slower instructions like these, the short ones can execute at full speed out of the queue. If, on the other hand, the slow instructions are executed sequentially, back to back after the first of them the bus unit will be forced to idle because the queue will be full, with the consequence that more of the faster instructions will suffer fetch delays that might have been avoidable; as some instructions, such as single-bit-position shifts and rotates, take 4 times as long to fetch as to execute, the overall effec
Pentium 4 is a brand by Intel for an entire series of single-core CPUs for desktops and entry-level servers. The processors were shipped from November 20, 2000, until August 8, 2008. All Pentium 4 CPUs are based on the NetBurst architecture; the Pentium 4 Willamette introduced SSE2, while the Prescott introduced SSE3. Versions introduced Hyper-Threading Technology; the first Pentium 4-branded processor to implement 64-bit was the Prescott, but this feature was not enabled. Intel subsequently began selling 64-bit Pentium 4s using the "E0" revision of the Prescotts, being sold on the OEM market as the Pentium 4, model F; the E0 revision adds eXecute Disable to Intel 64. Intel's official launch of Intel 64 in mainstream desktop processors was the N0 stepping Prescott-2M. Intel marketed a version of their low-end Celeron processors based on the NetBurst microarchitecture, a high-end derivative, intended for multi-socket servers and workstations. In 2005, the Pentium 4 was complemented by the dual-core-brands Pentium D and Pentium Extreme Edition.
In benchmark evaluations, the advantages of the NetBurst microarchitecture were unclear. With optimized application code, the first Pentium 4s outperformed Intel's fastest Pentium III, as expected, but in legacy applications with many branching or x87 floating-point instructions, the Pentium 4 would match or run slower than its predecessor. Its main downfall was a shared unidirectional bus; the NetBurst microarchitecture consumed more power and emitted more heat than any previous Intel or AMD microarchitectures. As a result, the Pentium 4's introduction was met with mixed reviews: Developers disliked the Pentium 4, as it posed a new set of code optimization rules. For example, in mathematical applications, AMD's lower-clocked Athlon outperformed the Pentium 4, which would only catch up if software was re-compiled with SSE2 support. Tom Yager of Infoworld magazine called it "the fastest CPU - for programs that fit in cache". Computer-savvy buyers avoided Pentium 4 PCs due to their price premium, questionable benefit, initial restriction to Rambus RAM.
In terms of product marketing, the Pentium 4's singular emphasis on clock frequency made it a marketer's dream. The result of this was that the NetBurst micro architecture was referred to as a marchitecture by various computing websites and publications during the life of the Pentium 4, it was called "NetBust," a term popular with reviewers who reflected negatively upon the processor's performance. The two classical metrics of CPU performance are clock speed. While IPC is difficult to quantify due to dependence on the benchmark application's instruction mix, clock speed is a simple measurement yielding a single absolute number. Unsophisticated buyers would consider the processor with the highest clock speed to be the best product, the Pentium 4 had the fastest clock speed; because AMD's processors had slower clock speeds, it countered Intel's marketing advantage with the "megahertz myth" campaign. AMD product marketing used a "PR-rating" system, which assigned a merit value based on relative performance to a baseline machine.
At the launch of the Pentium 4, Intel stated that NetBurst-based processors were expected to scale to 10 GHz after several fabrication process generations. However, the clock speed of processors using the NetBurst micro architecture reached a maximum of 3.8 GHz. Intel had not anticipated a rapid upward scaling of transistor power leakage that began to occur as the die reached the 90 nm lithography and smaller; this new power leakage phenomenon, along with the standard thermal output, created cooling and clock scaling problems as clock speeds increased. Reacting to these unexpected obstacles, Intel attempted several core redesigns and explored new manufacturing technologies, such as using multiple cores, increasing FSB speeds, increasing the cache size, using a longer instruction pipeline along with higher clock speeds; these solutions failed, from 2003 to 2005, Intel shifted development away from NetBurst to focus on the cooler-running Pentium M microarchitecture. On January 5, 2006, Intel launched the Core processors, which put greater emphasis on energy efficiency and performance per clock cycle.
The final NetBurst-derived products were released in 2007, with all subsequent product families switching to the Core microarchitecture. Pentium 4 processors have an integrated heat spreader that prevents the die from accidentally being damaged when mounting and unmounting cooling solutions. Prior to the IHS, a CPU shim was sometimes used by people worried about damaging the core. Overclockers sometimes removed the IHS from Socket 423 and Socket 478 chips to allow for more direct heat transfer. On processors using the Socket LGA 775 interface, the IHS is directly soldered to the die or dies, making it difficult to remove. Willamette, the project codename for the first NetBurst microarchitecture implementation, experienced long delays in the completion of its design process; the project was started in 1998. At that time, the Willamette core was expected to operate at frequencies up to about 1 GHz. However, the Pentium III was released. Due to the radical differences between the P6 and NetBurst microarchitectures, Intel could not market Willamette as a Pentium III, so it was marketed as the Pentium 4.
On November 20, 2000, Intel released the Willame
System Management Mode
System Management Mode is an operating mode of x86 central processor units in which all normal execution, including the operating system, is suspended. An alternate software system which resides in the computer's firmware, or a hardware-assisted debugger, is executed with high privileges, it was first released with the Intel 386SL. While special SL versions were required for SMM, Intel incorporated SMM in its mainline 486 and Pentium processors in 1993. AMD implemented Intel's SMM with the Am386 processors in 1991, it is available in all microprocessors in the x86 architecture. SMM is a special-purpose operating mode provided for handling system-wide functions like power management, system hardware control, or proprietary OEM designed code, it is intended for use only by system firmware, not by applications software or general-purpose systems software. The main benefit of SMM is that it offers a distinct and isolated processor environment that operates transparently to the operating system or executive and software applications.
In order to achieve transparency, SMM imposes certain rules. The SMM can only be entered through SMI; the processor executes the SMM code in a separate address space that has to be made inaccessible to other operating modes of the CPU by the firmware. System Management Mode was used for implementing Advanced Power Management features. However, over time, some BIOS manufacturers have relied on SMM for other functionality like making a USB keyboard work in real mode; some uses of the System Management Mode are: Handle system events like memory or chipset errors Manage system safety functions, such as shutdown on high CPU temperature and turning the fans on and off Security functions, such as flash device lock down require SMM support on some chipsets Deeper sleep power management support on Intel systems Control power management operations, such as managing the voltage regulator modules Emulate motherboard hardware, unimplemented or buggy Emulate a PS/2 mouse and/or keyboard by converting the messages from USB versions of those peripherals to the messages that would have been generated had PS/2 versions of such hardware been connected Centralize system configuration, such as on Toshiba and IBM notebook computers Emulate or forward calls to a Trusted Platform Module System Management Mode can be abused to run high-privileged rootkits, as demonstrated at Black Hat 2008 and 2015.
SMM is entered via the SMI, invoked by: Motherboard hardware or chipset signaling via a designated pin SMI# of the processor chip. This signal can be an independent event. Software SMI triggered by the system software via an I/O access to a location considered special by the motherboard logic. An I/O write to a location. By entering SMM, the processor looks for the first instruction at the address SMBASE + 8000H, using registers CS = 3000H and EIP = 8000H; the CS register value is due to the use of real mode memory addresses by the processor when in SMM. In this case, the CS is internally appended with 0H on its rightmost end. By design, the operating system cannot override or disable the SMI. Due to this fact, it is a target for malicious rootkits to reside in, including NSA's "implants" which have individual code names for specific hardware, like SOUFFLETROUGH for Juniper Networks firewalls, SCHOOLMONTANA for J-series routers of the same company, DEITYBOUNCE for DELL, or IRONCHEF for HP Proliant servers.
Improperly designed and insufficiently tested SMM BIOS code can make the wrong assumptions and not work properly when interrupting some other modes like PAE or 64-bit long mode. According to the documentation of the Linux kernel, around 2004, such buggy implementations of the USB legacy support feature were a common cause of crashes, for example on motherboards based on the Intel E7505 chipset. Since the SMM code is installed by the system firmware, the OS and the SMM code may have expectations about hardware settings that are incompatible, such as different ideas of how the Advanced Programmable Interrupt Controller should be set up. Operations in SMM take CPU time away from the applications, operating system kernel and hypervisor, with the effects magnified for multicore processors since each SMI causes all cores to switch modes. There is some overhead involved with switching in and out of SMM, since the CPU state must be stored to memory and any write-back caches must be flushed; this can cause clock ticks to get lost.
The Windows and Linux kernels define an ‘SMI Timeout’ setting a period within which SMM handlers must return control to the operating system or it will ‘hang’ or ‘crash’. The SMM may disrupt the behavior of real-time applications with constrained timing requirements. A logic analyzer may be required to determine if the CPU has entered SMM. Recovering the SMI handler code to analyze it for bugs and secrets requires a logic analyzer or disassembly of the system firmware. Coreboot – includes an open source SMM/SMI handler implementation, for some chipsets Intel 80486SL LOADALL MediaGX – a processor which emulates nonexistent hardware via SMM Ring -3 Unified Extensible Firmware Interface AMD Hammer BIOS and Kernel Developer's guide, Chapter 6 Intel 64 and IA-32 Architectures Developer's Manual, Volume 3C, Chapter 34