Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can be divided into smaller ones, which can be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level and task parallelism. Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling; as power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture in the form of multi-core processors. Parallel computing is related to concurrent computing—they are used together, conflated, though the two are distinct: it is possible to have parallelism without concurrency, concurrency without parallelism. In parallel computing, a computational task is broken down into several many similar sub-tasks that can be processed independently and whose results are combined afterwards, upon completion.
In contrast, in concurrent computing, the various processes do not address related tasks. Parallel computers can be classified according to the level at which the hardware supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, MPPs, grids use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks. In some cases parallelism is transparent to the programmer, such as in bit-level or instruction-level parallelism, but explicitly parallel algorithms those that use concurrency, are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are some of the greatest obstacles to getting good parallel program performance.
A theoretical upper bound on the speed-up of a single program as a result of parallelization is given by Amdahl's law. Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is implemented as a serial stream of instructions; these instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next one is executed. Parallel computing, on the other hand, uses multiple processing elements to solve a problem; this is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm with the others. The processing elements can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above. Parallel computing was used for scientific computing and the simulation of scientific problems in the natural and engineering sciences, such as meteorology.
This led to the design of parallel software, as well as high performance computing. Frequency scaling was the dominant reason for improvements in computer performance from the mid-1980s until 2004; the runtime of a program is equal to the number of instructions multiplied by the average time per instruction. Maintaining everything else constant, increasing the clock frequency decreases the average time it takes to execute an instruction. An increase in frequency thus decreases runtime for all compute-bound programs. However, power consumption P by a chip is given by the equation P = C × V 2 × F, where C is the capacitance being switched per clock cycle, V is voltage, F is the processor frequency. Increases in frequency increase the amount of power used in a processor. Increasing processor power consumption led to Intel's May 8, 2004 cancellation of its Tejas and Jayhawk processors, cited as the end of frequency scaling as the dominant computer architecture paradigm. To deal with the problem of power consumption and overheating the major central processing unit manufacturers started to produce power efficient processors with multiple cores.
The core is the computing unit of the processor and in multi-core processors each core is independent and can access the same memory concurrently. Multi-core processors have brought parallel computing to desktop computers, thus parallelisation of serial programmes has become a mainstream programming task. In 2012 quad-core processors became standard for desktop computers, while servers have 10 and 12 core processors. From Moore's law it can be predicted that the number of cores per processor will double every 18–24 months; this could mean that after 2020 a typical processor will have hundreds of cores. An operating system can ensure that different tasks and user programmes are run in parallel on the available cores. However, for a serial software programme to take full advantage of the multi-core architecture the programmer needs to restructure and parallelise the code. A speed-up of application software runtime will no longer be achieved through frequency scaling, instead programmers will need to parallelise their software code to take
A computer network is a digital telecommunications network which allows nodes to share resources. In computer networks, computing devices exchange data with each other using connections between nodes; these data links are established over cable media such as wires or optic cables, or wireless media such as Wi-Fi. Network computer devices that originate and terminate the data are called network nodes. Nodes are identified by network addresses, can include hosts such as personal computers and servers, as well as networking hardware such as routers and switches. Two such devices can be said to be networked together when one device is able to exchange information with the other device, whether or not they have a direct connection to each other. In most cases, application-specific communications protocols are layered over other more general communications protocols; this formidable collection of information technology requires skilled network management to keep it all running reliably. Computer networks support an enormous number of applications and services such as access to the World Wide Web, digital video, digital audio, shared use of application and storage servers and fax machines, use of email and instant messaging applications as well as many others.
Computer networks differ in the transmission medium used to carry their signals, communications protocols to organize network traffic, the network's size, traffic control mechanism and organizational intent. The best-known computer network is the Internet; the chronology of significant computer-network developments includes: In the late 1950s, early networks of computers included the U. S. military radar system Semi-Automatic Ground Environment. In 1959, Anatolii Ivanovich Kitov proposed to the Central Committee of the Communist Party of the Soviet Union a detailed plan for the re-organisation of the control of the Soviet armed forces and of the Soviet economy on the basis of a network of computing centres, the OGAS. In 1960, the commercial airline reservation system semi-automatic business research environment went online with two connected mainframes. In 1963, J. C. R. Licklider sent a memorandum to office colleagues discussing the concept of the "Intergalactic Computer Network", a computer network intended to allow general communications among computer users.
In 1964, researchers at Dartmouth College developed the Dartmouth Time Sharing System for distributed users of large computer systems. The same year, at Massachusetts Institute of Technology, a research group supported by General Electric and Bell Labs used a computer to route and manage telephone connections. Throughout the 1960s, Paul Baran and Donald Davies independently developed the concept of packet switching to transfer information between computers over a network. Davies pioneered the implementation of the concept with the NPL network, a local area network at the National Physical Laboratory using a line speed of 768 kbit/s. In 1965, Western Electric introduced the first used telephone switch that implemented true computer control. In 1966, Thomas Marill and Lawrence G. Roberts published a paper on an experimental wide area network for computer time sharing. In 1969, the first four nodes of the ARPANET were connected using 50 kbit/s circuits between the University of California at Los Angeles, the Stanford Research Institute, the University of California at Santa Barbara, the University of Utah.
Leonard Kleinrock carried out theoretical work to model the performance of packet-switched networks, which underpinned the development of the ARPANET. His theoretical work on hierarchical routing in the late 1970s with student Farouk Kamoun remains critical to the operation of the Internet today. In 1972, commercial services using X.25 were deployed, used as an underlying infrastructure for expanding TCP/IP networks. In 1973, the French CYCLADES network was the first to make the hosts responsible for the reliable delivery of data, rather than this being a centralized service of the network itself. In 1973, Robert Metcalfe wrote a formal memo at Xerox PARC describing Ethernet, a networking system, based on the Aloha network, developed in the 1960s by Norman Abramson and colleagues at the University of Hawaii. In July 1976, Robert Metcalfe and David Boggs published their paper "Ethernet: Distributed Packet Switching for Local Computer Networks" and collaborated on several patents received in 1977 and 1978.
In 1979, Robert Metcalfe pursued making Ethernet an open standard. In 1976, John Murphy of Datapoint Corporation created ARCNET, a token-passing network first used to share storage devices. In 1995, the transmission speed capacity for Ethernet increased from 10 Mbit/s to 100 Mbit/s. By 1998, Ethernet supported transmission speeds of a Gigabit. Subsequently, higher speeds of up to 400 Gbit/s were added; the ability of Ethernet to scale is a contributing factor to its continued use. Computer networking may be considered a branch of electrical engineering, electronics engineering, telecommunications, computer science, information technology or computer engineering, since it relies upon the theoretical and practical application of the related disciplines. A computer network facilitates interpersonal communications allowing users to communicate efficiently and via various means: email, instant messaging, online chat, video telephone calls, video conferencing. A network allows sharing of computing resources.
Users may access and use resources provided by devices on the network, such as printing a document on a shared network printer or use of a shared storage device. A network allows sharing of files, and
Computer data storage
Computer data storage called storage or memory, is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers; the central processing unit of a computer is. In practice all computers use a storage hierarchy, which puts fast but expensive and small storage options close to the CPU and slower but larger and cheaper options farther away; the fast volatile technologies are referred to as "memory", while slower persistent technologies are referred to as "storage". In the Von Neumann architecture, the CPU consists of two main parts: The control unit and the arithmetic logic unit; the former controls the flow of data between the CPU and memory, while the latter performs arithmetic and logical operations on data. Without a significant amount of memory, a computer would be able to perform fixed operations and output the result, it would have to be reconfigured to change its behavior. This is acceptable for devices such as desk calculators, digital signal processors, other specialized devices.
Von Neumann machines differ in having a memory in which they store their operating instructions and data. Such computers are more versatile in that they do not need to have their hardware reconfigured for each new program, but can be reprogrammed with new in-memory instructions. Most modern computers are von Neumann machines. A modern digital computer represents data using the binary numeral system. Text, pictures and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0; the most common unit of storage is the byte, equal to 8 bits. A piece of information can be handled by any computer or device whose storage space is large enough to accommodate the binary representation of the piece of information, or data. For example, the complete works of Shakespeare, about 1250 pages in print, can be stored in about five megabytes with one byte per character. Data are encoded by assigning a bit pattern to digit, or multimedia object.
Many standards exist for encoding. By adding bits to each encoded unit, redundancy allows the computer to both detect errors in coded data and correct them based on mathematical algorithms. Errors occur in low probabilities due to random bit value flipping, or "physical bit fatigue", loss of the physical bit in storage of its ability to maintain a distinguishable value, or due to errors in inter or intra-computer communication. A random bit flip is corrected upon detection. A bit, or a group of malfunctioning physical bits is automatically fenced-out, taken out of use by the device, replaced with another functioning equivalent group in the device, where the corrected bit values are restored; the cyclic redundancy check method is used in communications and storage for error detection. A detected error is retried. Data compression methods allow in many cases to represent a string of bits by a shorter bit string and reconstruct the original string when needed; this utilizes less storage for many types of data at the cost of more computation.
Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data compressed or not. For security reasons certain types of data may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots; the lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary and off-line storage is guided by cost per bit. In contemporary usage, "memory" is semiconductor storage read-write random-access memory DRAM or other forms of fast but temporary storage. "Storage" consists of storage devices and their media not directly accessible by the CPU hard disk drives, optical disc drives, other devices slower than RAM but non-volatile. Memory has been called core memory, main memory, real storage or internal memory. Meanwhile, non-volatile storage devices have been referred to as secondary storage, external memory or auxiliary/peripheral storage.
Primary storage referred to as memory, is the only one directly accessible to the CPU. The CPU continuously reads instructions executes them as required. Any data operated on is stored there in uniform manner. Early computers used delay lines, Williams tubes, or rotating magnetic drums as primary storage. By 1954, those unreliable methods were replaced by magnetic core memory. Core memory remained dominant until the 1970s, when advances in integrated circuit technology allowed semiconductor memory to become economically competitive; this led to modern random-access memo
In computing, a processor or processing unit is an electronic circuit which performs operations on some external data source memory or some other data stream. The term is used to refer to the central processor in a system, but typical computer systems combine a number of specialised "processors". CPU – central processing unit If designed conforming to the von Neumann architecture, it contains at least a control unit, arithmetic logic unit and processor registers. In some contexts, the ALU and registers are called the processing unit. GPU – graphics processing unit VPU – video processing unit TPU – tensor processing unit NPU – neural processing unit PPU – physics processing unit DSP – digital signal processor ISP – image signal processor SPU or SPE – synergistic processing element in the Cell microprocessor FPGA – field-programmable gate array sound chip All pages with titles containing processing unit Microprocessor Multi-core processor Superscalar processor Hardware acceleration
A data grid is an architecture or set of services that gives individuals or groups of users the ability to access and transfer large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and present it to users upon request; the data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be applied to the replicas. Developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible; the adjacent diagram depicts a high level view of a data grid.
Middleware provides all the services and applications necessary for efficient management of datasets and files within the data grid while providing users quick access to the datasets and files. There is a number of concepts and tools that must be available to make a data grid operationally viable. However, at the same time not all data grids require the same capabilities and services because of differences in access requirements and location of resources in comparison to users. In any case, most data grids will have similar middleware services that provide for a universal name space, data transport service, data access service, data replication and resource management service; when taken together, they are key to the data grids functional capabilities. Since sources of data within the data grid will consist of data from multiple separate systems and networks using different file naming conventions, it would be difficult for a user to locate data within the data grid and know they retrieved what they needed based on existing physical file names.
A universal or unified name space makes it possible to create logical file names that can be referenced within the data grid that map to PFNs. When an LFN is requested or queried, all matching PFNs are returned to include possible replicas of the requested data; the end user can choose from the returned results the most appropriate replica to use. This service is provided as part of a management system known as a Storage Resource Broker. Information about the locations of files and mappings between the LFNs and PFNs may be stored in a metadata or replica catalogue; the replica catalogue would contain information about LFNs. Another middleware service is that of providing for data transfer. Data transport will encompass multiple functions that are not just limited to the transfer of bits, to include such items as fault tolerance and data access. Fault tolerance can be achieved in a data grid by providing mechanisms that ensures data transfer will resume after each interruption until all requested data is received.
There are multiple possible methods that might be used to include starting the entire transmission over from the beginning of the data to resuming from where the transfer was interrupted. As an example, GridFTP provides for fault tolerance by sending data from the last acknowledged byte without starting the entire transfer from the beginning; the data transport service provides for the low-level access and connections between hosts for file transfer. The data transport service may use any number of modes to implement the transfer to include parallel data transfer where two or more data streams are used over the same channel or striped data transfer where two or more steams access different blocks of the file for simultaneous transfer to using the underlying built-in capabilities of the network hardware or developed protocols to support faster transfer speeds; the data transport service might optionally include a network overlay function to facilitate the routing and transfer of data as well as file I/O functions that allow users to see remote files as if they were local to their system.
The data transport service hides the complexity of access and transfer between the different systems to the user so it appears as one unified data source. Data access services work hand in hand with the data transfer service to provide security, access controls and management of any data transfers within the data grid. Security services provide mechanisms for authentication of users to ensure they are properly identified. Common forms of security for authentication can include the use of Kerberos. Authorization services are the mechanisms that control what the user is able to access after being identified through authentication. Common forms of authorization mechanisms can be as simple as file permissions. However, need for more stringent controlled access to data is done using Access Control Lists, Role-Based Access Control and Tasked-Based Authorization Controls; these types of controls can be used to provide granular access to files to include limits on access times, duration of access to granular controls that determine which files can be read or written to.
The final data access service that might be present to protect the confidentiality of the data transport is encryption. The most common form of encryption for this task has been the use of SSL while in transport. While all of these access services operate within the data grid, access services within the various administrative domains that host the datasets will still stay in place to enforce access rules; the data grid access servic
Riak is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, scalability. In addition to the open-source version, it comes in a supported enterprise version and a cloud storage version. Riak implements the principles from Amazon's Dynamo paper with heavy influence from the CAP Theorem. Written in Erlang, Riak has fault tolerant data replication and automatic data distribution across the cluster for performance and resilience. Riak is licensed using a freemium model: open source versions of Riak and Riak CS are available, but end users can pay for additional features and support. Riak has a pluggable backend with the default storage backend being Bitcask. LevelDB is supported. Fault-tolerant availability Riak replicates key/value stores across a cluster of nodes with a default n_val of three. In the case of node outages due to network partition or hardware failures, data can still be written to a neighboring node beyond the initial three, read-back due to its "masterless" peer-to-peer architecture.
Queries Riak provides a REST-ful API through HTTP and Protocol Buffers for basic PUT, GET, POST, DELETE functions. More complex queries are possible, including secondary indexes and MapReduce. MapReduce has native support for both Erlang. Predictable latency Riak distributes data across nodes with hashing and can provide latency profile in the case of multiple node failures. Storage options Keys/values can be stored in disk, or both. Multi-datacenter replication In multi-datacenter replication, one cluster acts as a "primary cluster." The primary cluster handles replication requests from one or more "secondary clusters". If the datacenter with the primary cluster goes down, a second cluster can take over as the primary cluster. There are two primary modes of operation: realtime. In fullsync mode, a complete synchronization occurs between primary and secondary cluster, by default every six hours. In real-time mode,replication to the secondary data center is triggered by updates to the primary data center.
All multi-datacenter replication occurs over multiple concurrent TCP connections to maximize performance and network utilization. Tunable consistency Option to choose between strong consistency for each bucket. Riak is available for free under the Apache 2 License. In addition, Basho Technologies offered two options for its commercial software, Riak Enterprise and Riak Enterprise Plus. Riak Enterprise Plus adds baseline and annual system health checks to ensure long-term platform stability and performance. Riak has official drivers for Ruby, Java and Python. There are numerous community-supported drivers for other programming languages. Riak was written by Andy Gross and others at Basho Technologies to power a web Sales Force Automation application by former engineers and executives from Akamai. There was more interest in the datastore technology than the applications built on it, so the company decided to build a business around Riak itself, gaining adoption throughout the Fortune 100 and becoming a foundation to many of the world's fastest-growing Web-based and social networking applications, as well as cloud service providers.
Releases after graduation include 1.1, released Feb 21 2012, added Riaknostic, enhanced error logging and reporting, improved resiliency for large clusters, a new graphical operations and monitoring interface called Riak Control. 1.4, released July 10, 2013, added counters, secondary indexing improvements, reduced object storage overhead, handoff progress reporting, enhancements to MDC replication. 2.0, released September 2, 2014, added new data types including sets, maps and flags simplifying application development. Strong consistency by bucket, full-text integration with Apache Solr and reduced replicas for Secondary sites. 2.1, released April 16, 2015, added an optimization for many write-heavy workloads – “write once” buckets – buckets whose entries are intended to be written once, never updated or over-written. 2.2, released November 17, 2016, added Support for Debian 8 and Ubuntu 16.04, Solr integration improvements. Riak may no longer be maintained by Basho 2.2.5, released April 26, 2018, is the first community release.
Added support for Multi-Datacentre Replication, not part of open-source Riak before, added a grow-only set data type, improved data distribution over nodes and cleaned up production test issues. Notable users include AT&T, GitHub, Best Buy, UK National Health Services, The Weather Channel, Riot Games. Basho Technologies Apache Accumulo Oracle NoSQL Database NoSQL Structured storage Memcached Redis Official website A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak Joyent - Riak SmartMachine Benchmark: The Technical Details
In computing, a server is a computer program or a device that provides functionality for other programs or devices, called "clients". This architecture is called the client–server model, a single overall computation is distributed across multiple processes or devices. Servers can provide various functionalities called "services", such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, application servers. Client–server systems are today most implemented by the request–response model: a client sends a request to the server, which performs some action and sends a response back to the client with a result or acknowledgement. Designating a computer as "server-class hardware" implies that it is specialized for running servers on it.
This implies that it is more powerful and reliable than standard personal computers, but alternatively, large computing clusters may be composed of many simple, replaceable server components. The use of the word server in computing comes from queueing theory, where it dates to the mid 20th century, being notably used in Kendall, the paper that introduced Kendall's notation. In earlier papers, such as the Erlang, more concrete terms such as " operators" are used. In computing, "server" dates at least to RFC 5, one of the earliest documents describing ARPANET, is contrasted with "user", distinguishing two types of host: "server-host" and "user-host"; the use of "serving" dates to early documents, such as RFC 4, contrasting "serving-host" with "using-host". The Jargon File defines "server" in the common sense of a process performing service for requests remote, with the 1981 version reading: SERVER n. A kind of DAEMON which performs a service for the requester, which runs on a computer other than the one on which the server runs.
Speaking, the term server refers to a computer program or process. Through metonymy, it refers to a device used for running several server programs. On a network, such a device is called a host. In addition to server, the words serve and service are used, though servicer and servant are not; the word service may refer to either the abstract form of e.g. Web service. Alternatively, it may refer to a computer program that turns a computer into a server, e.g. Windows service. Used as "servers serve users", in the sense of "obey", today one says that "servers serve data", in the same sense as "give". For instance, web servers "serve web pages to users" or "service their requests"; the server is part of the client–server model. The nature of communication between a client and server is response; this is in contrast with peer-to-peer model. In principle, any computerized process that can be used or called by another process is a server, the calling process or processes is a client, thus any general purpose computer connected to a network can host servers.
For example, if files on a device are shared by some process, that process is a file server. Web server software can run on any capable computer, so a laptop or a personal computer can host a web server. While request–response is the most common client–server design, there are others, such as the publish–subscribe pattern. In the publish–subscribe pattern, clients register with a pub–sub server, subscribing to specified types of messages. Thereafter, the pub–sub server forwards matching messages to the clients without any further requests: the server pushes messages to the client, rather than the client pulling messages from the server as in request–response; the purpose of a server is to share data as well as to distribute work. A server computer can serve its own computer programs as well; the following table shows several scenarios. The entire structure of the Internet is based upon a client–server model. High-level root nameservers, DNS, routers direct the traffic on the internet. There are millions of servers connected to the Internet, running continuously throughout the world and every action taken by an ordinary Internet user requires one or more interactions with one or more server.
There are exceptions. Hardware requirement for servers vary depending on the server's purpose and its software. Since servers are accessed over a network, many run unattended without a computer monitor or input device, audio hardware and USB interfaces. Many servers do not have a graphical user interface, they are managed remotely. Remote management can be conducted via various methods including Microsoft Management Console, PowerShell, SSH and browser-based out-of-band management systems such as Dell's iDRAC or HP's iLo. Large traditional single servers would need to be run for long periods without interruption. Ava