Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems, its fields can be divided into practical disciplines. Computational complexity theory is abstract, while computer graphics emphasizes real-world applications. Programming language theory considers approaches to the description of computational processes, while computer programming itself involves the use of programming languages and complex systems. Human–computer interaction considers the challenges in making computers useful and accessible; the earliest foundations of what would become computer science predate the invention of the modern digital computer. Machines for calculating fixed numerical tasks such as the abacus have existed since antiquity, aiding in computations such as multiplication and division.
Algorithms for performing computations have existed since antiquity before the development of sophisticated computing equipment. Wilhelm Schickard designed and constructed the first working mechanical calculator in 1623. In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner, he may be considered the first computer scientist and information theorist, among other reasons, documenting the binary number system. In 1820, Thomas de Colmar launched the mechanical calculator industry when he released his simplified arithmometer, the first calculating machine strong enough and reliable enough to be used daily in an office environment. Charles Babbage started the design of the first automatic mechanical calculator, his Difference Engine, in 1822, which gave him the idea of the first programmable mechanical calculator, his Analytical Engine, he started developing this machine in 1834, "in less than two years, he had sketched out many of the salient features of the modern computer".
"A crucial step was the adoption of a punched card system derived from the Jacquard loom" making it infinitely programmable. In 1843, during the translation of a French article on the Analytical Engine, Ada Lovelace wrote, in one of the many notes she included, an algorithm to compute the Bernoulli numbers, considered to be the first computer program. Around 1885, Herman Hollerith invented the tabulator, which used punched cards to process statistical information. In 1937, one hundred years after Babbage's impossible dream, Howard Aiken convinced IBM, making all kinds of punched card equipment and was in the calculator business to develop his giant programmable calculator, the ASCC/Harvard Mark I, based on Babbage's Analytical Engine, which itself used cards and a central computing unit; when the machine was finished, some hailed it as "Babbage's dream come true". During the 1940s, as new and more powerful computing machines were developed, the term computer came to refer to the machines rather than their human predecessors.
As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study computation in general. In 1945, IBM founded the Watson Scientific Computing Laboratory at Columbia University in New York City; the renovated fraternity house on Manhattan's West Side was IBM's first laboratory devoted to pure science. The lab is the forerunner of IBM's Research Division, which today operates research facilities around the world; the close relationship between IBM and the university was instrumental in the emergence of a new scientific discipline, with Columbia offering one of the first academic-credit courses in computer science in 1946. Computer science began to be established as a distinct academic discipline in the 1950s and early 1960s; the world's first computer science degree program, the Cambridge Diploma in Computer Science, began at the University of Cambridge Computer Laboratory in 1953. The first computer science degree program in the United States was formed at Purdue University in 1962.
Since practical computers became available, many applications of computing have become distinct areas of study in their own rights. Although many believed it was impossible that computers themselves could be a scientific field of study, in the late fifties it became accepted among the greater academic population, it is the now well-known IBM brand that formed part of the computer science revolution during this time. IBM released the IBM 704 and the IBM 709 computers, which were used during the exploration period of such devices. "Still, working with the IBM was frustrating if you had misplaced as much as one letter in one instruction, the program would crash, you would have to start the whole process over again". During the late 1950s, the computer science discipline was much in its developmental stages, such issues were commonplace. Time has seen significant improvements in the effectiveness of computing technology. Modern society has seen a significant shift in the users of computer technology, from usage only by experts and professionals, to a near-ubiquitous user base.
Computers were quite costly, some degree of humanitarian aid was needed for efficient use—in part from professional computer operators. As computer adoption became more widespread and affordable, less human assistance was needed for common usage. Despite its short history as a formal academic discipline, computer science has made a number of fundamental contributions to science and society—in fact, along with electronics, it is
In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases. In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria. Full-text-searching techniques became common in online bibliographic databases in the 1990s. Many websites and application programs provide full-text-search capabilities; some web search engines, such as AltaVista, employ full-text-search techniques, while others index only a portion of the web pages examined by their indexing systems. When dealing with a small number of documents, it is possible for the full-text-search engine to directly scan the contents of the documents with each query, a strategy called "serial scanning"; this is. However, when the number of documents to search is large, or the quantity of search queries to perform is substantial, the problem of full-text search is divided into two tasks: indexing and searching.
The indexing stage will build a list of search terms. In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents; the indexer will make an entry in the index for each term or word found in a document, note its relative position within the document. The indexer will ignore stop words that are both common and insufficiently meaningful to be useful in searching; some indexers employ language-specific stemming on the words being indexed. For example, the words "drives", "drove", "driven" will be recorded in the index under the single concept word "drive". Recall measures the quantity of relevant results returned by a search, while precision is the measure of the quality of the results returned. Recall is the ratio of relevant results returned to all relevant results. Precision is the number of relevant results returned to the total number of results returned; the diagram at right represents a low-recall search. In the diagram the red and green dots represent the total population of potential search results for a given search.
Red dots represent irrelevant results, green dots represent relevant results. Relevancy is indicated by the proximity of search results to the center of the inner circle. Of all possible results shown, those that were returned by the search are shown on a light-blue background. In the example only 1 relevant result of 3 possible relevant results was returned, so the recall is a low ratio of 1/3, or 33%; the precision for the example is a low 1/4, or 25%, since only 1 of the 4 results returned was relevant. Due to the ambiguities of natural language, full-text-search systems includes options like stop words to increase precision and stemming to increase recall. Controlled-vocabulary searching helps alleviate low-precision issues by tagging documents in such a way that ambiguities are eliminated; the trade-off between precision and recall is simple: an increase in precision can lower overall recall, while an increase in recall lowers precision. Full-text searching is to retrieve many documents that are not relevant to the intended search question.
Such documents are called false positives. The retrieval of irrelevant documents is caused by the inherent ambiguity of natural language. In the sample diagram at right, false positives are represented by the irrelevant results that were returned by the search. Clustering techniques based on Bayesian algorithms can help reduce false positives. For a search term of "bank", clustering can be used to categorize the document/data universe into "financial institution", "place to sit", "place to store" etc. Depending on the occurrences of words relevant to the categories, search terms or a search result can be placed in one or more of the categories; this technique is being extensively deployed in the e-discovery domain. The deficiencies of free text searching have been addressed in two ways: By providing users with tools that enable them to express their search questions more and by developing new search algorithms that improve retrieval precision. Keywords. Document creators are asked to supply a list of words that describe the subject of the text, including synonyms of words that describe this subject.
Keywords improve recall if the keyword list includes a search word, not in the document text. Field-restricted search; some search engines enable users to limit free text searches to a particular field within a stored data record, such as "Title" or "Author." Boolean queries. Searches that use Boolean operators can increase the precision of a free text search; the AND operator says, in effect, "Do not retrieve any document unless it contains both of these terms." The NOT operator says, in effect, "Do not retrieve any document that contains this word." If the retrieval list retrieves too few documents, the OR operator can be used to increase recall. This search will retrieve documents about online encyclopedias that use the term "Internet" instead of "online." This increase in precision is commonly counter-productive since it comes with