A database is an organized collection of data stored and accessed electronically from a computer system. Where databases are more complex they are developed using formal design and modeling techniques; the database management system is the software that interacts with end users and the database itself to capture and analyze the data. The DBMS software additionally encompasses; the sum total of the database, the DBMS and the associated applications can be referred to as a "database system". The term "database" is used to loosely refer to any of the DBMS, the database system or an application associated with the database. Computer scientists may classify database-management systems according to the database models that they support. Relational databases became dominant in the 1980s; these model data as rows and columns in a series of tables, the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, referred to as NoSQL because they use different query languages.
Formally, a "database" refers to the way it is organized. Access to this data is provided by a "database management system" consisting of an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database; the DBMS provides various functions that allow entry and retrieval of large quantities of information and provides ways to manage how that information is organized. Because of the close relationship between them, the term "database" is used casually to refer to both a database and the DBMS used to manipulate it. Outside the world of professional information technology, the term database is used to refer to any collection of related data as size and usage requirements necessitate use of a database management system. Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups: Data definition – Creation and removal of definitions that define the organization of the data.
Update – Insertion and deletion of the actual data. Retrieval – Providing information in a form directly usable or for further processing by other applications; the retrieved data may be made available in a form the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database. Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, recovering information, corrupted by some event such as an unexpected system failure. Both a database and its DBMS conform to the principles of a particular database model. "Database system" refers collectively to the database model, database management system, database. Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are multiprocessor computers, with generous memory and RAID disk arrays used for stable storage.
RAID is used for recovery of data. Hardware database accelerators, connected to one or more servers via a high-speed channel, are used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs rely on a standard operating system to provide these functions. Since DBMSs comprise a significant market and storage vendors take into account DBMS requirements in their own development plans. Databases and DBMSs can be categorized according to the database model that they support, the type of computer they run on, the query language used to access the database, their internal engineering, which affects performance, scalability and security; the sizes and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of processors, computer memory, computer storage, computer networks.
The development of database technology can be divided into three eras based on data model or structure: navigational, SQL/relational, post-relational. The two main early navigational data models were the hierarchical model and the CODASYL model The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links; the relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems. By the early 1990s, relational systems dominated in all large-scale data processing applications, as of 2018 they remain dominant: IBM DB2, Oracle, MySQL, Microsoft SQL Server are the most searched DBMS; the dominant database language, standardised SQL for the relational model, has influenced database languages for other data models. Object databases were developed in the 1980s to overcome the inconvenience of object-relational impedance mismatch, which led to the coining of the term "post-relational" and the development of hybrid object-relational databas
Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems, its fields can be divided into practical disciplines. Computational complexity theory is abstract, while computer graphics emphasizes real-world applications. Programming language theory considers approaches to the description of computational processes, while computer programming itself involves the use of programming languages and complex systems. Human–computer interaction considers the challenges in making computers useful and accessible; the earliest foundations of what would become computer science predate the invention of the modern digital computer. Machines for calculating fixed numerical tasks such as the abacus have existed since antiquity, aiding in computations such as multiplication and division.
Algorithms for performing computations have existed since antiquity before the development of sophisticated computing equipment. Wilhelm Schickard designed and constructed the first working mechanical calculator in 1623. In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner, he may be considered the first computer scientist and information theorist, among other reasons, documenting the binary number system. In 1820, Thomas de Colmar launched the mechanical calculator industry when he released his simplified arithmometer, the first calculating machine strong enough and reliable enough to be used daily in an office environment. Charles Babbage started the design of the first automatic mechanical calculator, his Difference Engine, in 1822, which gave him the idea of the first programmable mechanical calculator, his Analytical Engine, he started developing this machine in 1834, "in less than two years, he had sketched out many of the salient features of the modern computer".
"A crucial step was the adoption of a punched card system derived from the Jacquard loom" making it infinitely programmable. In 1843, during the translation of a French article on the Analytical Engine, Ada Lovelace wrote, in one of the many notes she included, an algorithm to compute the Bernoulli numbers, considered to be the first computer program. Around 1885, Herman Hollerith invented the tabulator, which used punched cards to process statistical information. In 1937, one hundred years after Babbage's impossible dream, Howard Aiken convinced IBM, making all kinds of punched card equipment and was in the calculator business to develop his giant programmable calculator, the ASCC/Harvard Mark I, based on Babbage's Analytical Engine, which itself used cards and a central computing unit; when the machine was finished, some hailed it as "Babbage's dream come true". During the 1940s, as new and more powerful computing machines were developed, the term computer came to refer to the machines rather than their human predecessors.
As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study computation in general. In 1945, IBM founded the Watson Scientific Computing Laboratory at Columbia University in New York City; the renovated fraternity house on Manhattan's West Side was IBM's first laboratory devoted to pure science. The lab is the forerunner of IBM's Research Division, which today operates research facilities around the world; the close relationship between IBM and the university was instrumental in the emergence of a new scientific discipline, with Columbia offering one of the first academic-credit courses in computer science in 1946. Computer science began to be established as a distinct academic discipline in the 1950s and early 1960s; the world's first computer science degree program, the Cambridge Diploma in Computer Science, began at the University of Cambridge Computer Laboratory in 1953. The first computer science degree program in the United States was formed at Purdue University in 1962.
Since practical computers became available, many applications of computing have become distinct areas of study in their own rights. Although many believed it was impossible that computers themselves could be a scientific field of study, in the late fifties it became accepted among the greater academic population, it is the now well-known IBM brand that formed part of the computer science revolution during this time. IBM released the IBM 704 and the IBM 709 computers, which were used during the exploration period of such devices. "Still, working with the IBM was frustrating if you had misplaced as much as one letter in one instruction, the program would crash, you would have to start the whole process over again". During the late 1950s, the computer science discipline was much in its developmental stages, such issues were commonplace. Time has seen significant improvements in the effectiveness of computing technology. Modern society has seen a significant shift in the users of computer technology, from usage only by experts and professionals, to a near-ubiquitous user base.
Computers were quite costly, some degree of humanitarian aid was needed for efficient use—in part from professional computer operators. As computer adoption became more widespread and affordable, less human assistance was needed for common usage. Despite its short history as a formal academic discipline, computer science has made a number of fundamental contributions to science and society—in fact, along with electronics, it is
In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. The program responsible may appear to hang until a crash reporting service reports the crash and any details relating to it. If the program is a critical part of the operating system, the entire system may crash or hang resulting in a kernel panic or fatal system error. Most crashes are the result of executing invalid machine instructions. Typical causes include incorrect address values in the program counter, buffer overflow, overwriting a portion of the affected program code due to an earlier bug, accessing invalid memory addresses, using an illegal opcode or triggering an unhandled exception; the original software bug that started this chain of events is considered to be the cause of the crash, discovered through the process of debugging. The original bug can be far removed from the code that crashed. In earlier personal computers, attempting to write data to hardware addresses outside the system's main memory could cause hardware damage.
Some crashes are exploitable and let a malicious program or hacker to execute arbitrary code allowing the replication of viruses or the acquisition of data which would be inaccessible. An application crashes when it performs an operation, not allowed by the operating system; the operating system triggers an exception or signal in the application. Unix applications traditionally responded to the signal by dumping core. Most Windows and Unix GUI applications respond by displaying a dialogue box with the option to attach a debugger if one is installed; some applications attempt to continue running instead of exiting. Typical errors that result in application crashes include: attempting to read or write memory, not allocated for reading or writing by that application or x86 specific attempting to execute privileged or invalid instructions attempting to perform I/O operations on hardware devices to which it does not have permission to access passing invalid arguments to system calls attempting to access other system resources to which the application does not have permission to access attempting to execute machine instructions with bad arguments: divide by zero, operations on denorms or NaN values, memory access to unaligned addresses, etc.
A "crash to desktop" is said to occur when a program unexpectedly quits, abruptly taking the user back to the desktop. The term is applied only to crashes where no error is displayed, hence all the user sees as a result of the crash is the desktop. Many times there is no apparent action. During normal function, the program may freeze for a shorter period of time, close by itself. During normal function, the program may become a black screen and play the last few seconds of sound, being played before it crashes to desktop. Other times it may appear to be triggered by a certain action, such as loading an area. Crash to desktop bugs are considered problematic for users. Since they display no error message, it can be difficult to track down the source of the problem if the times they occur and the actions taking place right before the crash do not appear to have any pattern or common ground. One way to track down the source of the problem for games is to run them in windowed-mode. Windows Vista has a feature that can help track down the cause of a CTD problem when it occurs on any program.
Windows XP included a similar feature as well. Some computer programs, such as StepMania and BBC's Bamzooki crash to desktop if in full-screen, but displays the error in a separate window when the user has returned to the desktop. Crashes are caused by website failure or system failure; the software running the web server behind a website may crash, rendering it inaccessible or providing only an error message instead of normal content. For example: if a site is using an SQL database for a script and that SQL database server crashes PHP will display a connection error. An operating system crash occurs when a hardware exception occurs that cannot be handled. Operating system crashes can occur when internal sanity-checking logic within the operating system detects that the operating system has lost its internal self-consistency. Modern multi-tasking operating systems, such as Linux, macOS remain unharmed when an application program crashes; some operating systems, e.g. z/OS, have facilities for Reliability and serviceability and the OS can recover from the crash of a critical component, whether due to hardware failure, e.g. uncorrectable ECC error, or to software failure, e.g. a reference to an unassigned page.
Depending on the application, the crash may contain the user's private information. Moreover, many software bugs which cause crashes are exploitable for arbitrary code execution and other types of privilege escalation. For example, a stack buffer overflow can overwrite the return address of a subroutine with an invalid value, which will cause a segmentation fault when the subroutine returns. However, if an exploit overwrites the return address with a valid value, the code in that address will be executed; when crashes are collected in the field using a crash reporter, the next step for developers is to be able to reproduce them locally. For this, several techniques exist: STAR uses symbolic execution, MuCrash mutates the test code of the application that has
In computer storage, disk buffer is the embedded memory in a hard disk drive acting as a buffer between the rest of the computer and the physical hard disk platter, used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, solid-state drives come with up to 4 GB of cache memory. Since the late 1980s, nearly all disks sold have embedded microcontrollers and either an ATA, Serial ATA, SCSI, or Fibre Channel interface; the drive circuitry has a small amount of memory, used to store the data going to and coming from the disk platters. The disk buffer is physically distinct from and is used differently from the page cache kept by the operating system in the computer's main memory; the disk buffer is controlled by the microcontroller in the hard disk drive, the page cache is controlled by the computer to which that disk is attached. The disk buffer is quite small, ranging between 8 and 256 MiB, the page cache is all unused main memory. While data in the page cache is reused multiple times, the data in the disk buffer is reused.
In this sense, the terms disk cache and cache buffer are misnomers. Note that disk array controllers, as opposed to disk controllers have normal cache memory of around 0.5–8 GiB. When executing a read from the disk, the disk arm moves the read/write head to the correct track, after some settling time the read head begins to pick up bits; the first sectors to be read are not the ones that have been requested by the operating system. The disk's embedded computer saves these unrequested sectors in the disk buffer, in case the operating system requests them later; the speed of the disk's I/O interface to the computer never matches the speed at which the bits are transferred to and from the hard disk platter. The disk buffer is used so that both the I/O interface and the disk read/write head can operate at full speed; the disk's embedded microcontroller may signal the main computer that a disk write is complete after receiving the write data, before the data is written to the platter. This early signal allows the main computer to continue working though the data has not been written yet.
This can be somewhat dangerous, because if power is lost before the data is permanently fixed in the magnetic media, the data will be lost from the disk buffer, the file system on the disk may be left in an inconsistent state. On some disks, this vulnerable period between signaling the write complete and fixing the data can be arbitrarily long, as the write can be deferred indefinitely by newly arriving requests. For this reason, the use of write acceleration can be controversial. Consistency can be maintained, however, by using a battery-backed memory system for caching data, although this is only found in high-end RAID controllers. Alternatively, the caching can be turned off when the integrity of data is deemed more important than write performance. Another option is to send data to disk in a managed order and to issue "cache flush" commands in the right places, referred to as the implementation of write barriers. Newer SATA and most SCSI disks can accept multiple commands while any one command is in operation through "command queuing".
These commands are stored by the disk's embedded controller. One benefit is that the commands can be re-ordered to be processed more efficiently, so that commands affecting the same area of a disk are grouped together. Should a read reference the data at the destination of a queued write, the to-be-written data will be returned. NCQ is used in combination with enabled write buffering. In case of a read/write FPDMA command with Force Unit Access bit set to 0 and enabled write buffering, an operating system may see the write operation finished before the data is physically written to the media. In case of FUA bit set to 1 and enabled write buffering, write operation returns only after the data is physically written to the media. Data, accepted in write cache of a disk device will be written to disk platters, provided that no starvation condition occurs as a result of firmware flaw, that disk power supply is not interrupted before cached writes are forced to disk platters. In order to control write cache, ATA specification included FLUSH CACHE and FLUSH CACHE EXT commands.
These commands cause the disk to complete writing data from its cache, disk will return good status after data in the write cache is written to disk media. In addition, flushing the cache can be initiated at least to some disks by issuing Soft reset or Standby command. Mandatory cache flushing is used in Linux for implementation of write barriers in some filesystems, together with Force Unit Access write command for journal commit blocks. Force Unit Access is an I/O write command option that forces written data all the way to stable storage. FUA write commands, in contrast to corresponding commands without FUA, write data directly to the media, regardless of whether write caching in the device is enabled or not. FUA write command will not return until data is written to media, thus data written by a completed FUA write command is on permanent media if the device is powered off before issuing a FLUSH CACHE command. FUA appeared in the SCSI command set, was adopted by SATA with NCQ. FUA is more fine-grained as it allows a single write operation to be forced to stable media and thus has smaller overall performance i
In databases and transaction processing, two-phase locking is a concurrency control method that guarantees serializability. It is the name of the resulting set of database transaction schedules; the protocol utilizes locks, applied by a transaction to data, which may block other transactions from accessing the same data during the transaction's life. By the 2PL protocol, locks are applied and removed in two phases: Expanding phase: locks are acquired and no locks are released. Shrinking phase: locks are released and no locks are acquired. Two types of locks are utilized by the basic protocol: Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions. A lock is a system object associated with a shared resource such as a data item of an elementary type, a row in a database, or a page of memory. In a database, a lock on a database object may need to be acquired by a transaction before accessing the object.
Correct use of locks prevents undesired, incorrect or inconsistent operations on shared resources by other concurrent transactions. When a database object with an existing lock acquired by one transaction needs to be accessed by another transaction, the existing lock for the object and the type of the intended access are checked by the system. If the existing lock type does not allow this specific attempted concurrent access type, the transaction attempting access is blocked. In practice, a lock on an object does not directly block a transaction's operation upon the object, but rather blocks that transaction from acquiring another lock on the same object, needed to be held/owned by the transaction before performing this operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper lock blocking scheme, which indicates which lock type blocks which lock type. Two major types of locks are utilized: Write-lock is associated with a database object by a transaction before writing this object.
Read-lock is associated with a database object by a transaction before reading this object. The common interactions between these lock types are defined by blocking behavior as follows: An existing write-lock on a database object blocks an intended write upon the same object by another transaction by blocking a respective write-lock from being acquired by the other transaction; the second write-lock will be acquired and the requested write of the object will take place after the existing write-lock is released. A write-lock blocks an intended read by another transaction by blocking the respective read-lock. A read-lock blocks an intended write by another transaction by blocking the respective write-lock. A read-lock does not block an intended read by another transaction; the respective read-lock for the intended read is acquired after the intended read is requested, the intended read itself takes place. Several variations and refinements of these major lock types exist, with respective variations of blocking behavior.
If a first lock blocks another lock, the two locks are called incompatible. Lock types blocking interactions are presented in the technical literature by a Lock compatibility table; the following is an example with the common, major lock types: X indicates incompatibility, i.e, a case when a lock of the first type on an object blocks a lock of the second type from being acquired on the same object. An object has a queue of waiting requested operations with respective locks; the first blocked lock for operation in the queue is acquired as soon as the existing blocking lock is removed from the object, its respective operation is executed. If a lock for operation in the queue is not blocked by any existing lock, it is acquired immediately. Comment: In some publications, the table entries are marked "compatible" or "incompatible", or "yes" or "no". According to the two-phase locking protocol, a transaction handles its locks in two distinct, consecutive phases during the transaction's execution: Expanding phase: locks are acquired and no locks are released.
Shrinking phase: locks are released and no locks are acquired. The two phase locking rule can be summarized as: never acquire a lock after a lock has been released; the serializability property is guaranteed for a schedule with transactions. Without explicit knowledge in a transaction on end of phase-1, it is safely determined only when a transaction has completed processing and requested commit. In this case, all the locks can be released at once; the difference between 2PL and C2PL is that C2PL's transactions obtain all the locks they need before the transactions begin. This is to ensure that a transaction that holds some locks will not block waiting for other locks. Conservative 2PL prevents deadlocks. To comply with the S2PL protocol, a transaction needs to comply with 2PL, release its write locks only after it has ended, i.e. being either committed or aborted. On the other hand, read locks are released during phase 2; this protocol is not appropriate
Jim Gray (computer scientist)
James Nicholas Gray was an American computer scientist who received the Turing Award in 1998 "for seminal contributions to database and transaction processing research and technical leadership in system implementation". Gray was born in San Francisco, the second child of Ann Emma Sanbrailo, a teacher and James Able Gray, in the U. S. Army; the family moved to Virginia, spending about four years there, until Gray's parents divorced, after which he returned to San Francisco with his mother. His father, an amateur inventor, patented a design for a ribbon cartridge for typewriters that earned him a substantial royalty stream. After being turned down for the Air Force Academy he entered the University of California, Berkeley as a freshman in 1961. To help pay for college he worked as a co-op for General Dynamics, where he learned to use a Monroe calculator. Discouraged by his chemistry grades, he left Berkeley for six months, returning after an experience in industry he described as "dreadful". Gray earned his B.
S. in Engineering Mathematics in 1966. After marrying, Gray moved with his wife Loretta to his wife's home state. At Bell, he worked three days a week and spent two days as a Master's student at New York University's Courant Institute. After a year they traveled for several months before settling again in Berkeley, where Gray entered graduate school with Michael A. Harrison as his advisor. In 1969 he received his Ph. D. in programming languages did two years of post-doctoral work for IBM. While at Berkeley and Loretta had a daughter, his second wife was Donna Carnes. Gray pursued his career working as a researcher and software designer at a number of industrial companies, including IBM, Tandem Computers, DEC, he joined Microsoft in 1995 and was a Technical Fellow for the company until he was lost at sea in 2007. Gray contributed to several major transaction processing systems. IBM's System R was the precursor of the SQL relational databases that have become a standard throughout the world. For Microsoft, he worked on Skyserver.
Among his best known achievements are: granular database locking two-tier transaction commit semantics the "five-minute rule" for allocating storage the data cube operator for data warehousing applications describing the requirements for reliable transaction processing and implementing them in software. He assisted in the development of Virtual Earth, he was one of the co-founders of the Conference on Innovative Data Systems Research. Gray, an experienced sailor, owned a forty-foot yacht. On January 28, 2007 he failed to return from a short solo trip to the Farallon Islands near San Francisco to scatter his mother's ashes; the weather was clear, no distress call was received, nor was any signal detected from the boat's automatic Emergency Position-Indicating Radio Beacon. A four-day Coast Guard search using planes and boats found nothing. On February 1, 2007, the DigitalGlobe satellite did a scan of the area and the thousands of images were posted to Amazon Mechanical Turk. Students and friends of Gray, computer scientists around the world formed a "Jim Gray Group" to study these images for clues.
On February 16 this search was suspended, an underwater search using sophisticated equipment ended May 31. The University of California and Gray's family hosted a tribute on May 31, 2008. Microsoft's WorldWide Telescope software is dedicated to Gray. In 2008, Microsoft opened a research center in Madison, named after Jim Gray. On January 28, 2012 Gray was declared dead; each year, Microsoft Research presents the Jim Gray eScience Award to a researcher who has made an outstanding contribution to the field of data-intensive computing. Award recipients are selected for their ground-breaking, fundamental contributions to the field of eScience. Previous award winners include Alex Szalay, Carole Goble, Jeff Dozier, Phil Bourne, Mark Abbott, Antony John Williams, Dr. David Lipman, M. D.. List of people who disappeared Gray's Microsoft Research home page, last accessed 23 June 2013 James Nicholas Gray, Turing Award citation Video Behind the Code on Channel 9, interviewed by Barbara Fox, 2005 Video The Future of Software and Databases, expert panel discussion with Rick Cattell, Don Chamberlin, Daniela Florescu, Jim Gray and Jim Melton, Software Development 2002 conference Oral History Interview with Jim Gray, Charles Babbage Institute, University of Minnesota.
Oral history interview by Philip L. Frana, 3 January 2002, San Francisco, California; the Future of Databases, SQL Down Under. Interview with Dr Greg Low, 2005. Tribute by Mark Whitehorn for The Register April 30, 2007 EE380: The Search for Jim Gray, Panel Discussion at Stanford University May 28, 2008 May 31, 2008 Tribute by James Hamilton Why Do Computers Stop and What Can Be Done About It?, a technical report by Jim Gray, 1985