A website or Web site is a collection of related network web resources, such as web pages, multimedia content, which are identified with a common domain name, published on at least one web server. Notable examples are wikipedia.org, google.com, amazon.com. Websites can be accessed via a public Internet Protocol network, such as the Internet, or a private local area network, by a uniform resource locator that identifies the site. Websites can be used in various fashions. Websites are dedicated to a particular topic or purpose, ranging from entertainment and social networking to providing news and education. All publicly accessible websites collectively constitute the World Wide Web, while private websites, such as a company's website for its employees, are part of an intranet. Web pages, which are the building blocks of websites, are documents composed in plain text interspersed with formatting instructions of Hypertext Markup Language, they may incorporate elements from other websites with suitable markup anchors.
Web pages are accessed and transported with the Hypertext Transfer Protocol, which may optionally employ encryption to provide security and privacy for the user. The user's application a web browser, renders the page content according to its HTML markup instructions onto a display terminal. Hyperlinking between web pages conveys to the reader the site structure and guides the navigation of the site, which starts with a home page containing a directory of the site web content; some websites require user subscription to access content. Examples of subscription websites include many business sites, news websites, academic journal websites, gaming websites, file-sharing websites, message boards, web-based email, social networking websites, websites providing real-time stock market data, as well as sites providing various other services. End users can access websites on a range of devices, including desktop and laptop computers, tablet computers and smart TVs; the World Wide Web was created in 1990 by the British CERN physicist Tim Berners-Lee.
On 30 April 1993, CERN announced. Before the introduction of HTML and HTTP, other protocols such as File Transfer Protocol and the gopher protocol were used to retrieve individual files from a server; these protocols offer a simple directory structure which the user navigates and where they choose files to download. Documents were most presented as plain text files without formatting, or were encoded in word processor formats. Websites can be used in various fashions. Websites can be the work of an individual, a business or other organization, are dedicated to a particular topic or purpose. Any website can contain a hyperlink to any other website, so the distinction between individual sites, as perceived by the user, can be blurred. Websites are written in, or converted to, HTML and are accessed using a software interface classified as a user agent. Web pages can be viewed or otherwise accessed from a range of computer-based and Internet-enabled devices of various sizes, including desktop computers, tablet computers and smartphones.
A website is hosted on a computer system known as a web server called an HTTP server. These terms can refer to the software that runs on these systems which retrieves and delivers the web pages in response to requests from the website's users. Apache is the most used web server software and Microsoft's IIS is commonly used; some alternatives, such as Nginx, Hiawatha or Cherokee, are functional and lightweight. A static website is one that has web pages stored on the server in the format, sent to a client web browser, it is coded in Hypertext Markup Language. Images are used to effect the desired appearance and as part of the main content. Audio or video might be considered "static" content if it plays automatically or is non-interactive; this type of website displays the same information to all visitors. Similar to handing out a printed brochure to customers or clients, a static website will provide consistent, standard information for an extended period of time. Although the website owner may make updates periodically, it is a manual process to edit the text and other content and may require basic website design skills and software.
Simple forms or marketing examples of websites, such as classic website, a five-page website or a brochure website are static websites, because they present pre-defined, static information to the user. This may include information about a company and its products and services through text, animations, audio/video, navigation menus. Static websites can be edited using four broad categories of software: Text editors, such as Notepad or TextEdit, where content and HTML markup are manipulated directly within the editor program WYSIWYG offline editors, such as Microsoft FrontPage and Adobe Dreamweaver, with which the site is edited using a GUI and the final HTML markup is generated automatically by the editor software WYSIWYG online editors which create media rich online presentation like web pages, intro, blogs, an
A Web crawler, sometimes called a spider or spiderbot and shortened to crawler, is an Internet bot that systematically browses the World Wide Web for the purpose of Web indexing. Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web content. Web crawlers copy pages for processing by a search engine which indexes the downloaded pages so users can search more efficiently. Crawlers consume resources on visited systems and visit sites without approval. Issues of schedule, "politeness" come into play when large collections of pages are accessed. Mechanisms exist for public sites not wishing to be crawled to make this known to the crawling agent. For example, including a robots.txt file can request bots to index only parts of a website, or nothing at all. The number of Internet pages is large. For this reason, search engines struggled to give relevant search results in the early years of the World Wide Web, before 2000.
Today, relevant results are given instantly. Crawlers can validate HTML code, they can be used for web scraping. A web crawler is known as a spider, an ant, an automatic indexer, or a Web scutter. A Web crawler starts with a list of URLs to visit, called the seeds; as the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies. If the crawler is performing archiving of websites it saves the information as it goes; the archives are stored in such a way they can be viewed and navigated as they were on the live web, but are preserved as ‘snapshots'. The archive is known as the repository and is designed to store and manage the collection of web pages; the repository only stores these pages are stored as distinct files. A repository is similar to any other system, like a modern day database; the only difference is that a repository does not need all the functionality offered by a database system.
The repository stores the most recent version of the web page retrieved by the crawler. The large volume implies the crawler can only download a limited number of the Web pages within a given time, so it needs to prioritize its downloads; the high rate of change can imply the pages might have been updated or deleted. The number of possible URLs crawled being generated by server-side software has made it difficult for web crawlers to avoid retrieving duplicate content. Endless combinations of HTTP GET parameters exist, of which only a small selection will return unique content. For example, a simple online photo gallery may offer three options to users, as specified through HTTP GET parameters in the URL. If there exist four ways to sort images, three choices of thumbnail size, two file formats, an option to disable user-provided content the same set of content can be accessed with 48 different URLs, all of which may be linked on the site; this mathematical combination creates a problem for crawlers, as they must sort through endless combinations of minor scripted changes in order to retrieve unique content.
As Edwards et al. noted, "Given that the bandwidth for conducting crawls is neither infinite nor free, it is becoming essential to crawl the Web in not only a scalable, but efficient way, if some reasonable measure of quality or freshness is to be maintained." A crawler must choose at each step which pages to visit next. The behavior of a Web crawler is the outcome of a combination of policies: a selection policy which states the pages to download, a re-visit policy which states when to check for changes to the pages, a politeness policy that states how to avoid overloading Web sites. A parallelization policy that states how to coordinate distributed web crawlers. Given the current size of the Web large search engines cover only a portion of the publicly available part. A 2009 study showed large-scale search engines index no more than 40-70% of the indexable Web; as a crawler always downloads just a fraction of the Web pages, it is desirable for the downloaded fraction to contain the most relevant pages and not just a random sample of the Web.
This requires a metric of importance for prioritizing Web pages. The importance of a page is a function of its intrinsic quality, its popularity in terms of links or visits, of its URL. Designing a good selection policy has an added difficulty: it must work with partial information, as the complete set of Web pages is not known during crawling. Junghoo Cho et al. made the first study on policies for crawling scheduling. Their data set was a 180,000-pages crawl from the stanford.edu domain, in which a crawling simulation was done with different strategies. The ordering metrics tested were backlink count and partial PageRank calculations. One of the conclusions was that if the crawler wants to download pages with high Pagerank early during the crawling process the partial Pagerank strategy is the better, followed by breadth-first and backlink-count. However, these results are for just a single domain. Cho wrote his Ph. D. dissertation at Stanford on web crawling. Najork and Wiener performed an actual crawl on 328 million pages, using breadth-first ordering
Computer software, or software, is a collection of data or computer instructions that tell the computer how to work. This is in contrast to physical hardware, from which the system is built and performs the work. In computer science and software engineering, computer software is all information processed by computer systems and data. Computer software includes computer programs and related non-executable data, such as online documentation or digital media. Computer hardware and software require each other and neither can be realistically used on its own. At the lowest programming level, executable code consists of machine language instructions supported by an individual processor—typically a central processing unit or a graphics processing unit. A machine language consists of groups of binary values signifying processor instructions that change the state of the computer from its preceding state. For example, an instruction may change the value stored in a particular storage location in the computer—an effect, not directly observable to the user.
An instruction may invoke one of many input or output operations, for example displaying some text on a computer screen. The processor executes the instructions in the order they are provided, unless it is instructed to "jump" to a different instruction, or is interrupted by the operating system; as of 2015, most personal computers, smartphone devices and servers have processors with multiple execution units or multiple processors performing computation together, computing has become a much more concurrent activity than in the past. The majority of software is written in high-level programming languages, they are easier and more efficient for programmers because they are closer to natural languages than machine languages. High-level languages are translated into machine language using a compiler or an interpreter or a combination of the two. Software may be written in a low-level assembly language, which has strong correspondence to the computer's machine language instructions and is translated into machine language using an assembler.
An outline for what would have been the first piece of software was written by Ada Lovelace in the 19th century, for the planned Analytical Engine. She created proofs to show; because of the proofs and the algorithm, she is considered the first computer programmer. The first theory about software—prior to creation of computers as we know them today—was proposed by Alan Turing in his 1935 essay On Computable Numbers, with an Application to the Entscheidungsproblem; this led to the creation of the academic fields of computer science and software engineering. Computer science is the theoretical study of computer and software, whereas software engineering is the application of engineering and development of software. However, prior to 1946, software was not yet the programs stored in the memory of stored-program digital computers, as we now understand it; the first electronic computing devices were instead rewired in order to "reprogram" them. On all computer platforms, software can be grouped into a few broad categories.
Based on the goal, computer software can be divided into: Application software, software that uses the computer system to perform special functions or provide entertainment functions beyond the basic operation of the computer itself. There are many different types of application software, because the range of tasks that can be performed with a modern computer is so large—see list of software. System software, software for managing computer hardware behaviour, as to provide basic functionalities that are required by users, or for other software to run properly, if at all. System software is designed for providing a platform for running application software, it includes the following: Operating systems which are essential collections of software that manage resources and provides common services for other software that runs "on top" of them. Supervisory programs, boot loaders and window systems are core parts of operating systems. In practice, an operating system comes bundled with additional software so that a user can do some work with a computer that only has one operating system.
Free On-line Dictionary of Computing
The Free On-line Dictionary of Computing is an online, encyclopedic dictionary of computing subjects. FOLDOC was hosted by Imperial College London. In May 2015, the site was updated to state that it was "no longer supported by Imperial College Department of Computing". Howe has served as the editor-in-chief since the dictionary's inception, with visitors to the website able to make suggestions for additions or corrections to articles; the dictionary incorporates the text of other free resources, such as the Jargon File, as well as covering many other computing-related topics. Due to its availability under the GNU Free Documentation License, a copyleft license, it has in turn been incorporated in whole or part into other free content projects, such as Wikipedia; this site's brief 2001 review by a Ziff Davis publication begins "Despite this online dictionary’s pale user interface, it offers impressive functionality." Oxford University Press knows of them, notes that it "is maintained by volunteers."
A university tells its students that FOLDOC can be used to find information about "companies, history, in fact any of the vocabulary you might expect to find in a computer dictionary." Official website
GNU Free Documentation License
The GNU Free Documentation License is a copyleft license for free documentation, designed by the Free Software Foundation for the GNU Project. It is similar to the GNU General Public License, giving readers the rights to copy and modify a work and requires all copies and derivatives to be available under the same license. Copies may be sold commercially, but, if produced in larger quantities, the original document or source code must be made available to the work's recipient; the GFDL was designed for manuals, other reference and instructional materials, documentation which accompanies GNU software. However, it can be used for any text-based work, regardless of subject matter. For example, the free online encyclopedia Wikipedia uses the GFDL for all of its text; the GFDL was released in draft form for feedback in September 1999. After revisions, version 1.1 was issued in March 2000, version 1.2 in November 2002, version 1.3 in November 2008. The current state of the license is version 1.3. The first discussion draft of the GNU Free Documentation License version 2 was released on September 26, 2006, along with a draft of the new GNU Simpler Free Documentation License.
On December 1, 2007, Wikipedia founder Jimmy Wales announced that a long period of discussion and negotiation between and amongst the Free Software Foundation, Creative Commons, the Wikimedia Foundation and others had produced a proposal supported by both the FSF and Creative Commons to modify the Free Documentation License in such a fashion as to allow the possibility for the Wikimedia Foundation to migrate the projects to the similar Creative Commons Attribution Share-Alike license. These changes were implemented on version 1.3 of the license, which includes a new provision allowing certain materials released under the license to be used under a Creative Commons Attribution Share-Alike license also. Material licensed under the current version of the license can be used for any purpose, as long as the use meets certain conditions. All previous authors of the work must be attributed. All changes to the work must be logged. All derivative works must be licensed under the same license; the full text of the license, unmodified invariant sections as defined by the author if any, any other added warranty disclaimers and copyright notices from previous versions must be maintained.
Technical measures such as DRM may not be used to control or obstruct distribution or editing of the document. The license explicitly separates any kind of "Document" from "Secondary Sections", which may not be integrated with the Document, but exist as front-matter materials or appendices. Secondary sections can contain information regarding the author's or publisher's relationship to the subject matter, but not any subject matter itself. While the Document itself is wholly editable, is covered by a license equivalent to the GNU General Public License, some of the secondary sections have various restrictions designed to deal with proper attribution to previous authors; the authors of prior versions have to be acknowledged and certain "invariant sections" specified by the original author and dealing with his or her relationship to the subject matter may not be changed. If the material is modified, its title has to be changed; the license has provisions for the handling of front-cover and back-cover texts of books, as well as for "History", "Acknowledgements", "Dedications" and "Endorsements" sections.
These features were added in part to make the license more financially attractive to commercial publishers of software documentation, some of whom were consulted during the drafting of the GFDL. "Endorsements" sections are intended to be used in official standard documents, where distribution of modified versions should only be permitted if they are not labeled as that standard any more. The GFDL requires the ability to "copy and distribute the Document in any medium, either commercially or noncommercially" and therefore is incompatible with material that excludes commercial re-use; as mentioned above, the GFDL was designed with commercial publishers in mind, as Stallman explained:The GFDL is meant as a way to enlist commercial publishers in funding free documentation without surrendering any vital liberty. The'cover text' feature, certain other aspects of the license that deal with covers, title page and endorsements, are included to make the license appealing to commercial publishers for books whose authors are paid.
Material that restricts commercial re-use is incompatible with the license and cannot be incorporated into the work. However, incorporating such restricted material may be fair use under United States copyright law and does not need to be licensed to fall within the GFDL if such fair use is covered by all potential subsequent uses. One example of such liberal and commercial fair use is parody. Although the two licenses work on similar copyleft principles, the GFDL is not compatible with the Creative Commons Attribution-ShareAlike license. However, at the request of the Wikimedia Foundation, version 1.3 added a time-limited section allowing specific types of websites using the GFDL to additionally offer their work under the CC BY-SA license. These exemptions allow a GFDL-based collaborative project with multiple authors to transition to the CC BY-SA 3.0 license, without first obtaining the permission of every author, if the work satisfies several
World Wide Web
The World Wide Web known as the Web, is an information space where documents and other web resources are identified by Uniform Resource Locators, which may be interlinked by hypertext, are accessible over the Internet. The resources of the WWW may be accessed by users by a software application called a web browser. English scientist Tim Berners-Lee invented the World Wide Web in 1989, he wrote the first web browser in 1990 while employed at CERN near Switzerland. The browser was released outside CERN in 1991, first to other research institutions starting in January 1991 and to the general public in August 1991; the World Wide Web has been central to the development of the Information Age and is the primary tool billions of people use to interact on the Internet. Web resources may be any type of downloaded media, but web pages are hypertext media that have been formatted in Hypertext Markup Language; such formatting allows for embedded hyperlinks that contain URLs and permit users to navigate to other web resources.
In addition to text, web pages may contain images, video and software components that are rendered in the user's web browser as coherent pages of multimedia content. Multiple web resources with a common theme, a common domain name, or both, make up a website. Websites are stored in computers that are running a program called a web server that responds to requests made over the Internet from web browsers running on a user's computer. Website content can be provided by a publisher, or interactively where users contribute content or the content depends upon the users or their actions. Websites may be provided for a myriad of informative, commercial, governmental, or non-governmental reasons. Tim Berners-Lee's vision of a global hyperlinked information system became a possibility by the second half of the 1980s. By 1985, the global Internet began to proliferate in Europe and the Domain Name System came into being. In 1988 the first direct IP connection between Europe and North America was made and Berners-Lee began to discuss the possibility of a web-like system at CERN.
While working at CERN, Berners-Lee became frustrated with the inefficiencies and difficulties posed by finding information stored on different computers. On March 12, 1989, he submitted a memorandum, titled "Information Management: A Proposal", to the management at CERN for a system called "Mesh" that referenced ENQUIRE, a database and software project he had built in 1980, which used the term "web" and described a more elaborate information management system based on links embedded as text: "Imagine the references in this document all being associated with the network address of the thing to which they referred, so that while reading this document, you could skip to them with a click of the mouse." Such a system, he explained, could be referred to using one of the existing meanings of the word hypertext, a term that he says was coined in the 1950s. There is no reason, the proposal continues, why such hypertext links could not encompass multimedia documents including graphics and video, so that Berners-Lee goes on to use the term hypermedia.
With help from his colleague and fellow hypertext enthusiast Robert Cailliau he published a more formal proposal on 12 November 1990 to build a "Hypertext project" called "WorldWideWeb" as a "web" of "hypertext documents" to be viewed by "browsers" using a client–server architecture. At this point HTML and HTTP had been in development for about two months and the first Web server was about a month from completing its first successful test; this proposal estimated that a read-only web would be developed within three months and that it would take six months to achieve "the creation of new links and new material by readers, authorship becomes universal" as well as "the automatic notification of a reader when new material of interest to him/her has become available". While the read-only goal was met, accessible authorship of web content took longer to mature, with the wiki concept, WebDAV, Web 2.0 and RSS/Atom. The proposal was modelled after the SGML reader Dynatext by Electronic Book Technology, a spin-off from the Institute for Research in Information and Scholarship at Brown University.
The Dynatext system, licensed by CERN, was a key player in the extension of SGML ISO 8879:1986 to Hypermedia within HyTime, but it was considered too expensive and had an inappropriate licensing policy for use in the general high energy physics community, namely a fee for each document and each document alteration. A NeXT Computer was used by Berners-Lee as the world's first web server and to write the first web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web: the first web browser and the first web server; the first web site, which described the project itself, was published on 20 December 1990. The first web page may be lost, but Paul Jones of UNC-Chapel Hill in North Carolina announced in May 2013 that Berners-Lee gave him what he says is the oldest known web page during a 1991 visit to UNC. Jones stored it on his NeXT computer. On 6 August 1991, Berners-Lee published a short summary of the World Wide Web project on the newsgroup alt.hypertext.
This date is sometimes confused with the public availability of the first web servers, which had occurred months earlier. As another example of such confusion, several news media reported that the first photo on the Web was published by Berners-Lee in 1992, an image of the CERN house band Les Horribles Cernettes taken by Silvano de Gennaro.