Cray Inc. is an American supercomputer manufacturer headquartered in Seattle, Washington. It manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world. Cray manufactures its products in Chippewa Falls, where its founder, Seymour Cray, was born and raised; the company has offices in Bloomington and numerous other sales, engineering, R&D locations around the world. The company's predecessor, Cray Research, Inc. was founded in 1972 by computer designer Seymour Cray. Seymour Cray went on to form the spin-off Cray Computer Corporation, in 1989, which went bankrupt in 1995, while Cray Research was bought by SGI the next year. Cray Inc. was formed in 2000 when Tera Computer Company purchased the Cray Research Inc. business from SGI and adopted the name of its acquisition. Seymour Cray began working in the computing field in 1950 when he joined Engineering Research Associates in Saint Paul, Minnesota.
There, he helped to create the ERA 1103. ERA became part of UNIVAC, began to be phased out, he left the company in 1960. He worked out of the CDC headquarters in Minneapolis, but grew upset by constant interruptions by managers, he set up a lab at his home town in Chippewa Falls, about 85 miles to the east. Cray had a string of successes at CDC, including the CDC 6600 and CDC 7600; when CDC ran into financial difficulties in the late 1960s, development funds for Cray's follow-on CDC 8600 became scarce. When he was told the project would have to be put "on hold" in 1972, Cray left to form his own company, Cray Research Inc. Copying the previous arrangement, Cray kept the research and development facilities in Chippewa Falls, put the business headquarters in Minneapolis; the company's first product, the Cray-1 supercomputer, was a major success because it was faster than all other computers at the time. The first system was sold within a month for US$8.8 million. Seymour Cray continued working, this time on the Cray-2, though it only ended up being marginally faster than the Cray X-MP, developed by another team at the company.
Cray soon left the CEO position to become an independent contractor. He started a new VLSI technology lab for the Cray-2 in Boulder, Cray Laboratories, in 1979, which closed in 1982. However, the changing political climate resulted in poor sales prospects. Only one Cray-3 was delivered, a number of follow-on designs were never completed; the company filed for bankruptcy in 1995. CCC's remains began Cray's final corporation, SRC Computers, Inc. Cray Research continued development along a separate line of computers with lead designer Steve Chen and the Cray X-MP. After Chen's departure, the Cray Y-MP, Cray C90 and Cray T90 were developed on the original Cray-1 architecture but achieved much greater performance via multiple additional processors, faster clocks, wider vector pipes; the uncertainty of the Cray-2 project gave rise to a number of Cray-object-code compatible "Crayette" firms: Scientific Computer Systems, American Supercomputer and one other firm. These firms did not mean to compete against Cray and therefore attempted less expensive, slower CMOS versions of the X-MP with the release of the COS operating system and the CFT Fortran compiler.
A series of massively parallel computers from Thinking Machines, Kendall Square Research, Intel Supercomputing Systems Division, nCUBE, MasPar and Meiko Scientific took over the 1980s high performance market. At first, Cray Research denigrated such approaches by complaining that developing software to use the machines was difficult – a true complaint in the era of the ILLIAC IV, but becoming less so each day. Cray realized that the approach was the only way forward and started a five-year project to capture the lead in this area: the plan's result was the DEC Alpha-based Cray T3D and Cray T3E series, which left Cray as the only remaining supercomputer vendor in the market besides NEC by 2000. Most sites with a Cray installation were considered a member of the "exclusive club" of Cray operators. Cray computers were considered quite prestigious because Crays were expensive machines, the number of units sold was small compared to ordinary mainframes; this perception extended to countries as well: to boost the perception of exclusivity, Cray Research's marketing department had promotional neckties made with a mosaic of tiny national flags illustrating the "club of Cray-operating countries".
New vendors introduced small supercomputers, known as minisupercomputers during the late 1980s and early 1990s, which out-competed low-end Cray machines in the market. The Convex Computer series, as well as a number of small-scale parallel machines from companies like Pyramid Technology and Alliant Computer Systems were popular. One such vendor was Supertek, whose S-1 machine was an air-cooled CMOS implementation of the X-MP processor. Cray purchased Supertek in 1990 and sold the S-1 as the Cray XMS, but the machine proved problematic.
Static random-access memory
Static random-access memory is a type of semiconductor memory that uses bistable latching circuitry to store each bit. SRAM exhibits data remanence, but it is still volatile in the conventional sense that data is lost when the memory is not powered; the term static differentiates SRAM from DRAM. SRAM is faster and more expensive than DRAM. Advantages: Simplicity – a refresh circuit is not needed Performance Reliability Low idle power consumptionDisadvantages: Price Density High operational power consumption The power consumption of SRAM varies depending on how it is accessed. On the other hand, static RAM used at a somewhat slower pace, such as in applications with moderately clocked microprocessors, draws little power and can have a nearly negligible power consumption when sitting idle – in the region of a few micro-watts. Several techniques have been proposed to manage power consumption of SRAM-based memory structures. General purpose products with asynchronous interface, such as the ubiquitous 28-pin 8K × 8 and 32K × 8 chips, as well as similar products up to 16 Mbit per chip with synchronous interface used for caches and other applications requiring burst transfers, up to 18 Mbit per chip integrated on chip as RAM or cache memory in micro-controllers as the primary caches in powerful microprocessors, such as the x86 family, many others to store the registers and parts of the state-machines used in some microprocessors on application specific ICs, or ASICs in Field Programmable Gate Array and Complex Programmable Logic Device Many categories of industrial and scientific subsystems, automotive electronics, similar, contain static RAM.
Some amount is embedded in all modern appliances, etc. that implement an electronic user interface. Several megabytes may be used in complex products such as digital cameras, cell phones, etc. SRAM in its dual-ported form is sometimes used for realtime digital signal processing circuits. SRAM is used in personal computers, workstations and peripheral equipment: CPU register files, internal CPU caches and external burst mode SRAM caches, hard disk buffers, router buffers, etc. LCD screens and printers normally employ static RAM to hold the image displayed. Static RAM was used for the main memory of some early personal computers such as the ZX80, TRS-80 Model 100 and Commodore VIC-20. Hobbyists home-built processor enthusiasts prefer SRAM due to the ease of interfacing, it is much easier to work with than DRAM as there are no refresh cycles and the address and data buses are directly accessible rather than multiplexed. In addition to buses and power connections, SRAM requires only three controls: Chip Enable, Write Enable and Output Enable.
In synchronous SRAM, Clock is included. Non-volatile SRAMs, or nvSRAMs, have standard SRAM functionality, but they save the data when the power supply is lost, ensuring preservation of critical information. NvSRAMs are used in a wide range of situations – networking and medical, among many others – where the preservation of data is critical and where batteries are impractical. PSRAMs have a DRAM storage core, combined with a self refresh circuit, they appear externally as a slower SRAM. They have a density/cost advantage over true SRAM, without the access complexity of DRAM. Bipolar junction transistor – fast but consumes a lot of power MOSFET – low power and common today Asynchronous – independent of clock frequency. Address, data in and other control signals are associated with the clock signalsIn 1990s, asynchronous SRAM used to be employed for fast access time. Asynchronous SRAM was used as main memory for small cache-less embedded processors used in everything from industrial electronics and measurement systems to hard disks and networking equipment, among many other applications.
Nowadays, synchronous SRAM is rather employed like Synchronous DRAM – DDR SDRAM memory is rather used than asynchronous DRAM. Synchronous memory interface is much faster as access time can be reduced by employing pipeline architecture. Furthermore, as DRAM is much cheaper than SRAM, SRAM is replaced by DRAM in the case when large volume of data is required. SRAM memory is however much faster for random access. Therefore, SRAM memory is used for CPU cache, small on-chip memory, FIFOs or other small buffers. Zero bus turnaround – the turnaround is the number of clock cycles it takes to change access to the SRAM from write to read and vice versa; the turnaround for ZBT SRAMs or the latency between read and write cycle is zero. SyncBurst – features synchronous burst write access to the SRAM to increase write operation to the SRAM DDR SRAM – Synchronous, single read/write port, double data rate I/O Quad Data Rate SRAM – Synchronous, separate read and write ports, quadruple data rate I/O Binary SRAM Ternary SRAM A typical SRAM cell is mad
In computing, a vector processor or array processor is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to the scalar processors, whose instructions operate on single data items. Vector processors can improve performance on certain workloads, notably numerical simulation and similar tasks. Vector machines appeared in the early 1970s and dominated supercomputer design through the 1970s into the 1990s, notably the various Cray platforms; the rapid fall in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the 1990s. As of 2015 most commodity CPUs implement architectures that feature instructions for a form of vector processing on multiple data sets. Common examples include Intel x86's MMX, SSE and AVX instructions, AMD's 3DNow! extensions, Sparc's VIS extension, PowerPC's AltiVec and MIPS' MSA. Vector processing techniques operate in video-game console hardware and in graphics accelerators.
In 2000, IBM, Toshiba and Sony collaborated to create the Cell processor. Other CPU designs include some multiple instructions for vector processing on multiple data sets known as MIMD and realized with VLIW; the Fujitsu FR-V VLIW/vector processor combines both technologies. Vector processing development began in the early 1960s at Westinghouse in their "Solomon" project. Solomon's goal was to increase math performance by using a large number of simple math co-processors under the control of a single master CPU; the CPU fed a single common instruction to all of the arithmetic logic units, one per cycle, but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962, Westinghouse cancelled the project, but the effort was restarted at the University of Illinois as the ILLIAC IV, their version of the design called for a 1 GFLOPS machine with 256 ALUs, when it was delivered in 1972, it had only 64 ALUs and could reach only 100 to 150 MFLOPS.
It showed that the basic concept was sound, when used on data-intensive applications, such as computational fluid dynamics, the ILLIAC was the fastest machine in the world. The ILLIAC approach of using separate ALUs for each data element is not common to designs, is referred to under a separate category, massively parallel computing. A computer for operations with functions was presented and developed by Kartsev in 1967; the first successful implementation of vector processing appears to be the Control Data Corporation STAR-100 and the Texas Instruments Advanced Scientific Computer. The basic ASC ALU used a pipeline architecture that supported both scalar and vector computations, with peak performance reaching 20 MFLOPS achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes; the STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive.
However the machine took considerable time decoding the vector instructions and getting ready to run the process, so it required specific data sets to work on before it sped anything up. The vector technique was first exploited in 1976 by the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight vector registers, which held sixty-four 64-bit words each; the vector instructions were applied between registers, much faster than talking to main memory. The Cray design used pipeline parallelism to implement vector instructions rather than multiple ALUs. In addition the design had separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication; this allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – far faster than any machine of the era.
Other examples followed. Control Data Corporation tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. In the early and mid-1980s Japanese companies (Fujitsu and Nippon Electric Corporation introduced register-based vector machines similar to the Cray-1 being faster and much smaller. Oregon-based Floating Point Systems built add-on array processors for minicomputers building their own minisupercomputers. Throughout, Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognising the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor.
Although vector supercomputers resembling the Cray-1 are less popular these days, NEC has continued to make this type of computer up to the present day, with their SX series of computers. Most the SX-Aurora TSUBASA places the processor and either 24 or 48 gigabytes of memory on an HBM 2 module within a card t
The Cray T3E was Cray Research's second-generation massively parallel supercomputer architecture, launched in late November 1995. The first T3E was installed at the Pittsburgh Supercomputing Center in 1996. Like the previous Cray T3D, it was a distributed memory machine using a 3D torus topology interconnection network; the T3E used the DEC Alpha 21164 microprocessor and was designed to scale from 8 to 2,176 Processing Elements. Each PE had between 64 MB and 2 GB of DRAM and a 6-way interconnect router with a payload bandwidth of 480 MB/s in each direction. Unlike many other MPP systems, including the T3D, the T3E was self-hosted and ran the UNICOS/mk distributed operating system with a GigaRing I/O subsystem integrated into the torus for network and tape I/O; the original T3E had a 300 MHz processor clock. Variants, using the faster 21164A processor, comprised the T3E-900, T3E-1200, T3E-1200E and T3E-1350; the T3E was available in both liquid-cooled configurations. AC systems were available with 16 to 128 user PEs, LC systems with 64 to 2048 user PEs.
A 1480-processor T3E-1200 was the first supercomputer to achieve a performance of more than 1 teraflops running a computational science application, in 1998. After Cray Research was acquired by Silicon Graphics in February 1996, development of new Alpha-based systems was stopped. While providing the -900, -1200 and -1200E upgrades to the T3E, in the long term Silicon Graphics intended Cray T3E users to migrate to the Origin 3000, a MIPS-based distributed shared memory computer, introduced in 2000. However, the T3E continued in production. History of supercomputing Top500 description of T3E Inside Cray T3E-900 Serial Number 6702, Performance Analysis of the CRAY T3E-1200E, Edward Anderson, Lockheed Martin Services Inc. 1999
The Cray X1 is a non-uniform memory access, vector processor supercomputer manufactured and sold by Cray Inc. since 2003. The X1 is described as the unification of the Cray T90, Cray SV1, Cray T3E architectures into a single machine; the X1 shares the multistreaming processors, vector caches, CMOS design of the SV1, the scalable distributed memory design of the T3E, the high memory bandwidth and liquid cooling of the T90. The X1 uses a 1.2 ns clock cycle, 8-wide vector pipes in MSP mode, offering a peak speed of 12.8 gigaflops per processor. Air-cooled models are available with up to 64 processors. Liquid-cooled systems scale to a theoretical maximum of 4096 processors, comprising 1024 shared-memory nodes connected in a two-dimensional torus network, in 32 frames; such a system would supply a peak speed of 50 teraflops. The largest unclassified X1 system was the 512 processor system at Oak Ridge National Laboratory, though this has since been upgraded to an X1E system; the X1 can be programmed either with used message passing software like MPI and PVM, or with shared-memory languages like Unified Parallel C programming language or Co-array Fortran.
The X1 runs an operating system called UNICOS/mp which shares more with the SGI IRIX operating system than it does with the UNICOS found on prior generation Cray machines. In 2005, Cray released the X1E upgrade, which uses dual-core processors, allowing two quad-processor nodes to fit on a node board; the processors are upgraded to 1150 MHz. This upgrade triples the peak performance per board, but reduces the per-processor memory and interconnect bandwidth. X1 and X1E boards can be combined within the same system; the X1 is notable for its development being funded by United States Government's National Security Agency. The X1 was not a financially successful product and it seems doubtful that it or its successors would have been produced without this support. ORNL X1 evaluation Cray Legacy Products Cray X1E at top500.org
The T3D was Cray Research's first attempt at a massively parallel supercomputer architecture. Launched in 1993, it marked Cray's first use of another company's microprocessor; the T3D consisted of between 32 and 2048 Processing Elements, each comprising a 150 MHz DEC Alpha 21064 microprocessor and either 16 or 64 MB of DRAM. PEs were grouped in nodes, which incorporated a 6-way processor interconnect switch; these switches had a peak bandwidth of 300 MB/second in each direction and were connected to form a three-dimensional torus network topology. The T3D was designed to be hosted by a Cray Y-MP Model E, M90 or C90-series "front-end" system and rely on it and its UNICOS operating system for all I/O and most system services; the T3D PEs ran a simple microkernel called UNICOS MAX. Several different configurations of T3D were available; the SC models shared a cabinet with a host Y-MP system and were available with either 128 or 256 PEs. The MC models were housed in one or more liquid-cooled cabinet separately from the host, while the MCA models were smaller air-cooled multi-cabinet configurations.
There was a liquid-cooled MCN model which had an alternative interconnect wiremat allowing non-power-of-2 numbers of PEs. The Cray T3D MC cabinet had an Apple Macintosh PowerBook laptop built into its front, its only purpose was to display animated Cray T3D logos on its color LCD screen. The first T3D delivered was a prototype installed at the Pittsburgh Supercomputing Center in early September 1993; the supercomputer was formally introduced on 27 September 1993. The T3D was superseded in 1995 by the faster and more sophisticated Cray T3E. CRAY T3D System Architecture Overview Manual
The Cray Y-MP was a supercomputer sold by Cray Research from 1988, the successor to the company's X-MP. The Y-MP retained software compatibility with the X-MP, but extended the address registers from 24 to 32 bits. High-density VLSI ECL technology was used and a new liquid cooling system was devised; the Y-MP ran the Cray UNICOS operating system. The Y-MP could be equipped with two, four or eight vector processors, with two functional units each and a clock cycle time of 6 ns. Peak performance was thus 333 megaflops per processor. Main memory comprised 128, 256 or 512 MB of SRAM; the original Y-MP was housed in a similar chassis to the horseshoe-shaped X-MP, but with an extra rectangular cabinet added in the middle, thus forming a "Y" shape in plan view. The system could be configured with one or two Model D IOSs and an optional Solid State Disk of 256 MB to 4GB capacity; the Y-MP had a measured GFLOPS of 2.144 and a peak GFLOPS of 2.667 in both 1988 and 1989. The Model D Y-MP was superseded in 1990 by the Y-MP Model E, which replaced IOS Model D with IOS Model E, providing twice the I/O throughput.
The Y-shaped chassis was dropped in favor of one or two rectangular cabinets, depending on configuration. Maximum RAM was increased to 2 GB and up to eight IOSs were possible. Model E variants included the Y-MP 2E, Y-MP 4E, Y-MP 8E and Y-MP 8I, the latter being a single-cabinet version of the two-cabinet 8E; the 2E and 4E were available with optional secondary air cooling. The Y-MP M90 was a large-memory variant of the Y-MP Model E introduced in 1992; this replaced the SRAM of the Y-MP with up to 32 GB of physically smaller DRAM devices. The Y-MP M90 was available in variants with up to two, four or eight processors; the model name was abbreviated to the Cray M90 series. The Y-MP C90 series is described separately. In 1992, Cray launched the cheaper Y-MP EL model; this was a reimplementation of the Y-MP architecture in CMOS technology, based on the S-2 design acquired by Cray from Supertek Computers in 1990. The EL was an air-cooled system with a different VMEbus-based IOS. EL configurations with up to four processors and 32 MB to 1 GB of DRAM were available.
The Y-MP EL was developed into the Cray EL90 series. The Y-MP EL came in a cabinet much smaller than the traditional room-filling Cray 2010×1270×810 mm and 635 kg in weight—and could be powered from regular mains power. In the 1992 film Sneakers, whose story is centered around high-level cryptography, two lead characters have an important discussion while sitting on a Cray Y-MP. In an episode of the television dramedy Northern Exposure titled "Nothing's Perfect", a character expresses her excitement at having gained access to a "CRAY Y-MP3" supercomputer. Arthur Trew and Greg Wilson. Past, Parallel: A Survey of Available Parallel Computing Systems. New York: Springer-Verlag. ISBN 0-387-19664-1. Fred Gannett's Cray FAQ, Part 1 Working online Cray Y-MP EL in Cray-Cyber museum