Faith Ellen is a professor of computer science at the University of Toronto who studies distributed data structures and the theory of distributed computing. She earned her bachelors and masters from the University of Waterloo in 1977 and 1978 and doctorate in 1982 from the University of California, Berkeley under the supervision of Richard Karp, she joined the faculty of the University of Washington in 1983, moved to Toronto in 1986. She became a Fellow of the Association for Computing Machinery in 2014
The Power ISA is an instruction set architecture developed by the OpenPOWER Foundation, led by IBM. It was developed by the defunct Power.org industry group. Power ISA is an evolution of the PowerPC ISA, created by the mergers of the core PowerPC ISA and the optional Book E for embedded applications; the merger of these two components in 2006 was led by Power.org founders IBM and Freescale Semiconductor. The ISA is divided into several categories and every component is defined as a part of a category. Processors implement a set of these categories. Different classes of processors are required to implement certain categories, for example a server class processor includes the categories Base, Floating-Point, 64-Bit, etc. All processors implement the Base category; the Power ISA is a RISC load/store architecture. It has multiple sets of registers: thirty-two 32-bit or 64-bit general purpose registers for integer operations. Sixty-four 128-bit vector scalar registers for floating point operations. Thirty-two 64-bit floating-point registers as part of the VSRs for floating point operations.
Thirty-two 128-bit vector registers as part of the VSRs for vector operations. Eight 4-bit condition register fields for control flow. Special registers: counter register, link register, time base, alternate time base, status registers. Instructions have a length of 32 bits, with the exception of the VLE subset that provides for higher code density for low-end embedded applications. Most instructions are triadic, i.e. have one destination. Single and double precision IEEE-754 compliant floating point operations are supported, including additional fused multiply–add and decimal floating-point instructions. There are provisions for SIMD operations on integer and floating point data on up to 16 elements in a single instruction. Support for Harvard cache, i.e. split data and instruction caches, as well as support for unified caches. Memory operations are load/store, but allow for out-of-order execution. Support for both big and little-endian addressing with separate categories for moded and per-page endianness.
Support for both 32-bit and 64-bit addressing. Different modes of operation include user and hypervisor. Base – Most of Book I and Book II Server – Book III-S Embedded – Book III-E Misc – floating point, signal processing, cache locking, decimal floating point, etc; the Power ISA specification is divided into five parts, called "books": Book I – User Instruction Set Architecture covers the base instruction set available to the application programmer. Memory reference, flow control, floating point, numeric acceleration, application-level programming, it includes chapters regarding auxiliary processing units like the AltiVec extension. Book II – Virtual Environment Architecture defines the storage model available to the application programmer, including timing, cache management, storage features, byte ordering. Book III – Operating Environment Architecture includes exceptions, memory management, debug facilities and special control functions. It's divided into two parts. Book III-S – Defines the supervisor instructions used for general purpose/server implementations.
It is the contents of the Book III of the former PowerPC ISA. Book III-E – Defines the supervisor instructions used for embedded applications, it is derived from the former PowerPC Book E. Book VLE – Variable Length Encoded Instruction Architecture defines alternative instructions and definitions from Book I-III, intended for higher instruction density and very-low-end applications, they use big endian byte ordering. The specification for Power ISA v.2.03 is based on the former PowerPC ISA v.2.02 in POWER5+ and the Book E extension of the PowerPC specification. The Book I included five new chapters regarding auxiliary processing units like DSPs and the AltiVec extension. Compliant cores Freescale PowerPC e200, e500 IBM PowerPC 405, 440, 460, 970, POWER5 and POWER6 IBM Cell PPE The specification for Power ISA v.2.04 was finalized in June 2007. It is based on Power ISA v.2.03 and includes changes to the Book III-S part regarding virtualization, hypervisor functionality, logical partitioning and virtual page handling.
Compliant cores All cores that comply with previous versions of the Power ISA The PA6T core from P. A. Semi Titan from AMCC The specification for Power ISA v.2.05 was released in December 2007. It is based on Power ISA v.2.04 and includes changes to Book I and Book III-S, including significant enhancements such as decimal arithmetic and server hypervisor improvements. Compliant cores All cores that comply with previous versions of the Power ISA POWER6 PowerPC 476 The specification for Power ISA v.2.06 was released in February 2009, revised in July 2010. It includes extensions for the POWER7 processor and e500-mc core. One significant new feature is vector-scalar floating-point instructions. Book III-E includes significant enhancement for the embedded specification regarding hypervisor and virtualisation on single and multi core implementations; the spec was revised in November 2010 to the Power ISA v.2.06 revision B spec, enhancing virtualization features."Power ISA 2.06 Rev. B enables full hardware virtualization for embedded space".
EETimes. 2010-11-03. Retrieved 2011-06-08.</ref> Compliant cores All cores that comply with previous versions of the Power ISA POWER7 A2 e500-mc e5500 The specification for Power ISA v.2.07 was released in May 20
Synchronization (computer science)
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are used to implement data synchronization; the need for synchronization does not arise in multi-processor systems but for any kind of concurrent processes. Mentioned below are some of the main needs for synchronization: Forks and Joins: When a job arrives at a fork point, it is split into N sub-jobs which are serviced by n tasks. After being serviced, each sub-job waits, they are joined again and leave the system. Thus, in parallel programming, we require synchronization as all the parallel processes wait for several other processes to occur.
Producer-Consumer: In a producer-consumer relationship, the consumer process is dependent on the producer process till the necessary data has been produced. Exclusive use resources: When multiple processes are dependent on a resource and they need to access it at the same time the operating system needs to ensure that only one processor accesses it at a given point in time; this reduces concurrency. Thread synchronization is defined as a mechanism which ensures that two or more concurrent processes or threads do not execute some particular program segment known as critical section. Processes' access to critical section is controlled by using synchronization techniques; when one thread starts executing the critical section the other thread should wait until the first thread finishes. If proper synchronization techniques are not applied, it may cause a race condition where the values of variables may be unpredictable and vary depending on the timings of context switches of the processes or threads.
For example, suppose that there are three processes, namely 1, 2, 3. All three of them are concurrently executing, they need to share a common resource as shown in Figure 1. Synchronization should be used here to avoid any conflicts for accessing this shared resource. Hence, when Process 1 and 2 both try to access that resource, it should be assigned to only one process at a time. If it is assigned to Process 1, the other process needs to wait. Another synchronization requirement which needs to be considered is the order in which particular processes or threads should be executed. For example, we can not board a plane. We cannot check e-mails without validating our credentials. In the same way, an ATM will not provide any service until we provide it with a correct PIN. Other than mutual exclusion, synchronization deals with the following: deadlock, which occurs when many processes are waiting for a shared resource, being held by some other process. In this case, the processes just execute no further.
This violation of priority rules can happen under certain circumstances and may lead to serious consequences in real-time systems. This frequent polling robs processing time from other processes. One of the challenges for exascale algorithm design is to reduce synchronization. Synchronization takes more time than computation in distributed computing. Reducing synchronization drew attention from computer scientists for decades. Whereas it becomes an significant problem as the gap between the improvement of computing and latency increases. Experiments have shown that communications due to synchronization on a distributed computers takes a dominated share in a sparse iterative solver; this problem is receiving increasing attention after the emergence of a new benchmark metric, the High Performance Conjugate Gradient, for ranking the top 500 supercomputers. The following are some classic problems of synchronization: The Producer–Consumer Problem; these problems are used to test nearly every newly proposed synchronization scheme.
Many systems provide hardware support for critical section code. A single processor or uniprocessor system could disable interrupts by executing running code without preemption, inefficient on multiprocessor systems. "The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically; these hardware primitives are the basic building blocks that are used to bui
In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. Transactional memory systems provide high-level abstraction as an alternative to low-level thread synchronization; this abstraction allows for coordination between concurrent reads and writes of shared data in parallel systems. In concurrent programming, synchronization is required when parallel threads attempt to access a shared resource. Low level thread synchronization constructs such as locks are pessimistic and prohibit threads that are outside a critical section from making any changes; the process of applying and releasing locks functions as additional overhead in workloads with little conflict among threads. Transactional memory provides optimistic concurrency control by allowing threads to run in parallel with minimal interference.
The goal of transactional memory systems is to transparently support regions of code marked as transactions by enforcing atomicity and isolation. A transaction is a collection of operations that can execute and commit changes as long as a conflict is not present; when a conflict is detected, a transaction will revert to its initial state and will rerun until all conflicts are removed. Before a successful commit, the outcome of any operation is purely speculative inside a transaction. In contrast to lock-based synchronization where operations are serialized to prevent data corruption, transactions allow for additional parallelism as long as few operations attempt to modify a shared resource. Since the programmer is not responsible for explicitly identifying locks or the order in which they are acquired, programs that utilize transactional memory cannot produce a deadlock. With these constructs in place, transactional memory provides a high level programming abstraction by allowing programmers to enclose their methods within transactional blocks.
Correct implementations ensure that data cannot be shared between threads without going through a transaction and produce a serializable outcome. For example, code can be written as: In the code, the block defined by "transaction" is guaranteed atomicity and isolation by the underlying transactional memory implementation and is transparent to the programmer; the variables within the transaction are protected from external conflicts, ensuring that either the correct amount is transferred or no action is taken at all. Note that concurrency related bugs are still possible in programs that use a large number of transactions in software implementations where the library provided by the language is unable to enforce correct use. Bugs introduced through transactions can be difficult to debug since breakpoints cannot be placed within a transaction. Transactional memory is limited in. Although transactional memory programs cannot produce a deadlock, programs may still suffer from a livelock or resource starvation.
For example, longer transactions may revert in response to multiple smaller transactions, wasting both time and energy. The abstraction of atomicity in transactional memory requires a hardware mechanism to detect conflicts and undo any changes made to shared data. Hardware transactional memory systems may comprise modifications in processors and bus protocol to support transactions. Speculative values in a transaction must be buffered and remain unseen by other threads until commit time. Large buffers are used to store speculative values while avoiding write propagation through the underlying cache coherence protocol. Traditionally, buffers have been implemented using different structures within the memory hierarchy such as store queues or caches. Buffers further away from the processor, such as the L2 cache, can hold more speculative values; the optimal size of a buffer is still under debate due to the limited use of transactions in commercial programs. In a cache implementation, the cache lines are augmented with read and write bits.
When the hardware controller receives a request, the controller uses these bits to detect a conflict. If a serializability conflict is detected from a parallel transaction the speculative values are discarded; when caches are used, the system may introduce the risk of false conflicts due to the use of cache line granularity. Load-link/store-conditional offered by many RISC processors can be viewed as the most basic transactional memory support. Although hardware transactional memory provides maximal performance compared to software alternatives, limited use has been seen at this time. Software transactional memory provides transactional memory semantics in a software runtime library or the programming language, requires minimal hardware support; as the downside, software implementations come with a performance penalty, when compared to hardware solutions. Hardware acceleration can reduce some of the overheads associated with software transactional memory. Owing to the more limited nature of hardware transactional memory, software using it may require extensive tuning to benefit from it.
For example, the dynamic memory allocator may have a significant influence on performance and structure padding may affect performance.
Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems, its fields can be divided into practical disciplines. Computational complexity theory is abstract, while computer graphics emphasizes real-world applications. Programming language theory considers approaches to the description of computational processes, while computer programming itself involves the use of programming languages and complex systems. Human–computer interaction considers the challenges in making computers useful and accessible; the earliest foundations of what would become computer science predate the invention of the modern digital computer. Machines for calculating fixed numerical tasks such as the abacus have existed since antiquity, aiding in computations such as multiplication and division.
Algorithms for performing computations have existed since antiquity before the development of sophisticated computing equipment. Wilhelm Schickard designed and constructed the first working mechanical calculator in 1623. In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner, he may be considered the first computer scientist and information theorist, among other reasons, documenting the binary number system. In 1820, Thomas de Colmar launched the mechanical calculator industry when he released his simplified arithmometer, the first calculating machine strong enough and reliable enough to be used daily in an office environment. Charles Babbage started the design of the first automatic mechanical calculator, his Difference Engine, in 1822, which gave him the idea of the first programmable mechanical calculator, his Analytical Engine, he started developing this machine in 1834, "in less than two years, he had sketched out many of the salient features of the modern computer".
"A crucial step was the adoption of a punched card system derived from the Jacquard loom" making it infinitely programmable. In 1843, during the translation of a French article on the Analytical Engine, Ada Lovelace wrote, in one of the many notes she included, an algorithm to compute the Bernoulli numbers, considered to be the first computer program. Around 1885, Herman Hollerith invented the tabulator, which used punched cards to process statistical information. In 1937, one hundred years after Babbage's impossible dream, Howard Aiken convinced IBM, making all kinds of punched card equipment and was in the calculator business to develop his giant programmable calculator, the ASCC/Harvard Mark I, based on Babbage's Analytical Engine, which itself used cards and a central computing unit; when the machine was finished, some hailed it as "Babbage's dream come true". During the 1940s, as new and more powerful computing machines were developed, the term computer came to refer to the machines rather than their human predecessors.
As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study computation in general. In 1945, IBM founded the Watson Scientific Computing Laboratory at Columbia University in New York City; the renovated fraternity house on Manhattan's West Side was IBM's first laboratory devoted to pure science. The lab is the forerunner of IBM's Research Division, which today operates research facilities around the world; the close relationship between IBM and the university was instrumental in the emergence of a new scientific discipline, with Columbia offering one of the first academic-credit courses in computer science in 1946. Computer science began to be established as a distinct academic discipline in the 1950s and early 1960s; the world's first computer science degree program, the Cambridge Diploma in Computer Science, began at the University of Cambridge Computer Laboratory in 1953. The first computer science degree program in the United States was formed at Purdue University in 1962.
Since practical computers became available, many applications of computing have become distinct areas of study in their own rights. Although many believed it was impossible that computers themselves could be a scientific field of study, in the late fifties it became accepted among the greater academic population, it is the now well-known IBM brand that formed part of the computer science revolution during this time. IBM released the IBM 704 and the IBM 709 computers, which were used during the exploration period of such devices. "Still, working with the IBM was frustrating if you had misplaced as much as one letter in one instruction, the program would crash, you would have to start the whole process over again". During the late 1950s, the computer science discipline was much in its developmental stages, such issues were commonplace. Time has seen significant improvements in the effectiveness of computing technology. Modern society has seen a significant shift in the users of computer technology, from usage only by experts and professionals, to a near-ubiquitous user base.
Computers were quite costly, some degree of humanitarian aid was needed for efficient use—in part from professional computer operators. As computer adoption became more widespread and affordable, less human assistance was needed for common usage. Despite its short history as a formal academic discipline, computer science has made a number of fundamental contributions to science and society—in fact, along with electronics, it is
X86 is a family of instruction set architectures based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address; the term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Many additions and extensions have been added to the x86 instruction set over the years consistently with full backward compatibility; the architecture has been implemented in processors from Intel, Cyrix, AMD, VIA and many other companies. Of those, only Intel, AMD, VIA hold x86 architectural licenses, are producing modern 64-bit designs; the term is not synonymous with IBM PC compatibility, as this implies a multitude of other computer hardware. As of 2018, the majority of personal computers and laptops sold are based on the x86 architecture, while other categories—especially high-volume mobile categories such as smartphones or tablets—are dominated by ARM.
In the 1980s and early 1990s, when the 8088 and 80286 were still in common use, the term x86 represented any 8086 compatible CPU. Today, however, x86 implies a binary compatibility with the 32-bit instruction set of the 80386; this is due to the fact that this instruction set has become something of a lowest common denominator for many modern operating systems and also because the term became common after the introduction of the 80386 in 1985. A few years after the introduction of the 8086 and 8088, Intel added some complexity to its naming scheme and terminology as the "iAPX" of the ambitious but ill-fated Intel iAPX 432 processor was tried on the more successful 8086 family of chips, applied as a kind of system-level prefix. An 8086 system, including coprocessors such as 8087 and 8089, as well as simpler Intel-specific system chips, was thereby described as an iAPX 86 system. There were terms iRMX, iSBC, iSBX – all together under the heading Microsystem 80. However, this naming scheme was quite temporary.
Although the 8086 was developed for embedded systems and small multi-user or single-user computers as a response to the successful 8080-compatible Zilog Z80, the x86 line soon grew in features and processing power. Today, x86 is ubiquitous in both stationary and portable personal computers, is used in midrange computers, workstations and most new supercomputer clusters of the TOP500 list. A large amount of software, including a large list of x86 operating systems are using x86-based hardware. Modern x86 is uncommon in embedded systems and small low power applications as well as low-cost microprocessor markets, such as home appliances and toys, lack any significant x86 presence. Simple 8-bit and 16-bit based architectures are common here, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo and Intel Atom are examples of 32- and 64-bit designs used in some low power and low cost segments. There have been several attempts, including by Intel itself, to end the market dominance of the "inelegant" x86 architecture designed directly from the first simple 8-bit microprocessors.
Examples of this are the iAPX 432, the Intel 960, Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitectures and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD's 64-bit extension of x86 and the scalability of x86 chips such as the eight-core Intel Xeon and 12-core AMD Opteron is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from new architectures; the table below lists processor models and model series implementing variations of the x86 instruction set, in chronological order. Each line item is characterized by improved or commercially successful processor microarchitecture designs. At various times, companies such as IBM, NEC, AMD, TI, STM, Fujitsu, OKI, Cyrix, Intersil, C&T, NexGen, UMC, DM&P started to design or manufacture x86 processors intended for personal computers as well as embedded systems; such x86 implementations are simple copies but employ different internal microarchitectures as well as different solutions at the electronic and physical levels.
Quite early compatible microprocessors were 16-bit, while 32-bit designs were developed much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors named to Intel's original chips. Other companies, which designed or manufactured x86 or x87 processors, include ITT Corporation, National Semiconductor, ULSI System Technology, Weitek. Following the pipelined i486, Intel introduced the Pentium brand name for their new set of superscalar x86 designs.
RISC-V is an open-source hardware instruction set architecture based on established reduced instruction set computer principles. The project began in 2010 at the University of California, but many contributors are volunteers not affiliated with the university; as of March 2019, version 2.2 of the user-space ISA is frozen, permitting most software development to proceed. The privileged ISA is available as draft version 1.10. A debug specification is available as a draft version 0.13.1. Usable new ISAs are very expensive. Computer-designers cannot afford to work for free. Developing a CPU requires design expertise in several specialties: Electronic digital logic and operating systems, it is rare to find such a team outside of a professional engineering organization. The team is paid from money charged for their designs. Therefore, commercial vendors of computer designs, such as ARM Holdings and MIPS Technologies charge royalties for the use of their designs and copyrights, they often require non-disclosure agreements before releasing documents that describe their designs' detailed advantages and instruction set.
In many cases, they never describe the reasons for their design choices. This expense and secrecy make the development of new software much more difficult, it prevents security audits. Another result is that modern, high-quality general-purpose computer instruction sets have not been explained or available except in academic settings. RISC-V was started to solve these problems; the goal was to make a practical ISA, open-sourced, usable in any hardware or software design without royalties. The rationales for every part of the project are explained, at least broadly; the RISC-V authors have substantial experience in computer design. The RISC-V ISA is a direct development from a series of academic computer-design projects, it was originated in part to aid such projects. To address the cost of design, the project started as academic research funded by DARPA. In order to build a large, continuing community of users and therefore accumulate designs and software, the RISC-V ISA designers planned to support a wide variety of practical uses: Small and low-power real-world implementations, without over-architecting for a particular microarchitecture.
A need for a large base of contributors is part of the reason why RISC-V was engineered to fit so many uses. Therefore, many RISC-V contributors see the project as a unified community effort; the term RISC dates from about 1980. Before this, there was some knowledge that simpler computers could be effective, but the design principles were not described. Simple, effective computers have always been of academic interest. Academics created the RISC instruction set DLX for the first edition of Computer Architecture: A Quantitative Approach in 1990. David Patterson was an author, assisted RISC-V. DLX was for educational use. Academics and hobbyists implemented it using field-programmable gate arrays, it was not a commercial success. ARM CPUs, versions 2 and earlier, had a public-domain instruction set, it is still supported by the GNU Compiler Collection, a popular free-software compiler. Three open-source cores exist for this ISA. OpenRISC is an open-source ISA based with associated RISC designs, it is supported with GCC and Linux implementations.
However, it has few commercial implementations. Krste Asanović at the University of California, found many uses for an open-source computer system. In 2010, he decided to develop and publish one in a "short, three-month project over the summer"; the plan was to help both industrial users. David Patterson at Berkeley aided the effort, he identified the properties of Berkeley RISC, RISC-V is one of his long series of cooperative RISC research projects. At this stage, students inexpensively provided initial software, CPU designs; the RISC-V authors and their institution provided the ISA documents and several CPU designs under BSD licenses, which allow derivative works—such as RISC-V chip designs—to be either open and free, or closed and proprietary. Early funding was from DARPA. Commercial concerns require an ISA to be stable before they can utilize it in a product that might last many years. To address this issue, the RISC-V foundation was formed to own and publish intellectual property related to RISC-V's definition.
The original authors and owners have surrendered their rights to the foundation. As of 2019 the foundation publishes the documents defining RISC-V and permits unrestricted utilization of the ISA for both software and hardware design. However, only paid members of the RISC-V foundation can vote to approve changes or utilize the trademarked compatibility logo. 2017: The Linley Group's Analyst's Choice Award for Best Technology The designers say that the instruction set is the main interface in a computer because it lies between the hardware and the software. If a good instruction set was open, available for use by all, it should reduce the cost of software by permitting far more reuse, it should increase competition among hardware providers, who can use more resources for design and less for software support. The designers assert that new principles are becoming rare in instruction set design, as the most successful designs of the last forty years have become similar. Of those that failed, most did so because their sponsoring companies failed commercially, not because the instruction sets were poor technically.
So, a well-designed open instruction set designed using well-established principles should attract long-term sup