Deoxyribonucleic acid is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development and reproduction of all known organisms and many viruses. DNA and ribonucleic acid are nucleic acids; the two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases, a sugar called deoxyribose, a phosphate group; the nucleotides are joined to one another in a chain by covalent bonds between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules, with hydrogen bonds to make double-stranded DNA; the complementary nitrogenous bases are divided into two groups and purines. In DNA, the pyrimidines are cytosine. Both strands of double-stranded DNA store the same biological information.
This information is replicated as and when the two strands separate. A large part of DNA is non-coding, meaning that these sections do not serve as patterns for protein sequences; the two strands of DNA are thus antiparallel. Attached to each sugar is one of four types of nucleobases, it is the sequence of these four nucleobases along the backbone. RNA strands are created using DNA strands as a template in a process called transcription. Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation. Within eukaryotic cells, DNA is organized into long structures called chromosomes. Before typical cell division, these chromosomes are duplicated in the process of DNA replication, providing a complete set of chromosomes for each daughter cell. Eukaryotic organisms store most of their DNA inside the cell nucleus as nuclear DNA, some in the mitochondria as mitochondrial DNA, or in chloroplasts as chloroplast DNA. In contrast, prokaryotes store their DNA only in circular chromosomes.
Within eukaryotic chromosomes, chromatin proteins, such as histones and organize DNA. These compacting structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed. DNA was first isolated by Friedrich Miescher in 1869, its molecular structure was first identified by Francis Crick and James Watson at the Cavendish Laboratory within the University of Cambridge in 1953, whose model-building efforts were guided by X-ray diffraction data acquired by Raymond Gosling, a post-graduate student of Rosalind Franklin. DNA is used by researchers as a molecular tool to explore physical laws and theories, such as the ergodic theorem and the theory of elasticity; the unique material properties of DNA have made it an attractive molecule for material scientists and engineers interested in micro- and nano-fabrication. Among notable advances in this field are DNA origami and DNA-based hybrid materials. DNA is a long polymer made from repeating units called nucleotides.
The structure of DNA is dynamic along its length, being capable of coiling into tight loops and other shapes. In all species it is composed of two helical chains, bound to each other by hydrogen bonds. Both chains are coiled around the same axis, have the same pitch of 34 angstroms; the pair of chains has a radius of 10 angstroms. According to another study, when measured in a different solution, the DNA chain measured 22 to 26 angstroms wide, one nucleotide unit measured 3.3 Å long. Although each individual nucleotide is small, a DNA polymer can be large and contain hundreds of millions, such as in chromosome 1. Chromosome 1 is the largest human chromosome with 220 million base pairs, would be 85 mm long if straightened. DNA does not exist as a single strand, but instead as a pair of strands that are held together; these two long strands coil in the shape of a double helix. The nucleotide contains both a segment of the backbone of a nucleobase. A nucleobase linked to a sugar is called a nucleoside, a base linked to a sugar and to one or more phosphate groups is called a nucleotide.
A biopolymer comprising multiple linked nucleotides is called a polynucleotide. The backbone of the DNA strand is made from alternating sugar residues; the sugar in DNA is 2-deoxyribose, a pentose sugar. The sugars are joined together by phosphate groups that form phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings; these are known as the 3′-end, 5′-end carbons, the prime symbol being used to distinguish these carbon atoms from those of the base to which the deoxyribose forms a glycosidic bond. When imagining DNA, each phosphoryl is considered to "belong" to the nucleotide whose 5′ carbon forms a bond therewith. Any DNA strand therefore has one end at which there is a phosphoryl attached to the 5′ carbon of a ribose and another end a
Genome projects are scientific endeavours that aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences; the Human Genome Project was a landmark genome project, having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments. Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source is first fractured into millions of small pieces.
These pieces are "read" by automated sequencing machines, which can read up to 1000 nucleotides or bases at a time. A genome assembly algorithm works by taking all the pieces and aligning them to one another, detecting all places where two of the short sequences, or reads, overlap; these overlapping reads can be merged, the process continues. Genome assembly is a difficult computational problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats; these repeats can be thousands of nucleotides long, some occur in thousands of different locations in the large genomes of plants and animals. The resulting genome sequence is produced by combining the information sequenced contigs and employing linking information to create scaffolds. Scaffolds are positioned along the physical map of the chromosomes creating a "golden path". Most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased.
An example of such assembler Short Oligonucleotide Analysis Package developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, indel finding, structural variation analysis. Since the 1980s, molecular biology and bioinformatics have created the need for DNA annotation. DNA annotation or genome annotation is the process of identifying attaching biological information to sequences, in identifying the locations of genes and determining what those genes do; when sequencing a genome, there are regions that are difficult to sequence. Thus,'completed' genome sequences are ever complete, terms such as'working draft' or'essentially complete' have been used to more describe the status of such genome projects; when every base pair of a genome sequence has been determined, there are still to be errors present because DNA sequencing is not a accurate process. It could be argued that a complete genome project should include the sequences of mitochondria and chloroplasts as these organelles have their own genomes.
It is reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. The proportion of a genome that encodes for genes may be small. However, it is not always possible to only sequence the coding regions separately; as scientists understand more about the role of this noncoding DNA, it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism; such projects may include gene prediction to find out where the genes are in a genome, what those genes do. There may be related projects to sequence ESTs or mRNAs to help find out where the genes are; when sequencing eukaryotic genomes it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece.
Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be'shotgun sequenced' in one go. Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has fallen and newer technology has meant that genomes can be sequenced far more quickly; when research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism or have a relevance to human health or species which have commercial importance. Secondary emphasis is placed on species whose genomes will help answer important questions in molecu
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes or to genotype multiple regions of a genome; each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target; the original nucleic acid arrays were macro arrays 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown; the core principle behind microarrays is hybridization between two DNA strands, the property of complementary nucleic acid sequences to pair with each other by forming hydrogen bonds between complementary nucleotide base pairs.
A high number of complementary base pairs in a nucleotide sequence means tighter non-covalent bonding between the two strands. After washing off non-specific bonding sequences, only paired strands will remain hybridized. Fluorescently labeled target sequences that bind to a probe sequence generate a signal that depends on the hybridization conditions, washing after hybridization. Total strength of the signal, from a spot, depends upon the amount of target sample binding to the probes present on that spot. Microarrays use relative quantitation in which the intensity of a feature is compared to the intensity of the same feature under a different condition, the identity of the feature is known by its position. Many types of arrays exist and the broadest distinction is whether they are spatially arranged on a surface or on coded beads: The traditional solid-phase array is a collection of orderly microscopic "spots", called features, each with thousands of identical and specific probes attached to a solid surface, such as glass, plastic or silicon biochip.
Thousands of these features can be placed in known locations on a single DNA microarray. The alternative bead array is a collection of microscopic polystyrene beads, each with a specific probe and a ratio of two or more dyes, which do not interfere with the fluorescent dyes used on the target sequence. DNA microarrays can be used to detect DNA, or detect RNA that may or may not be translated into proteins; the process of measuring gene expression via cDNA is called expression analysis or expression profiling. Applications include: Microarrays can be manufactured in different ways, depending on the number of probes under examination, customization requirements, the type of scientific question being asked. Arrays from commercial vendors may have as few as 10 probes or as many as 5 million or more micrometre-scale probes. Microarrays can be fabricated using a variety of technologies, including printing with fine-pointed pins onto glass slides, photolithography using pre-made masks, photolithography using dynamic micromirror devices, ink-jet printing, or electrochemistry on microelectrode arrays.
In spotted microarrays, the probes are oligonucleotides, cDNA or small fragments of PCR products that correspond to mRNAs. The probes are synthesized prior to deposition on the array surface and are "spotted" onto glass. A common approach utilizes an array of fine pins or needles controlled by a robotic arm, dipped into wells containing DNA probes and depositing each probe at designated locations on the array surface; the resulting "grid" of probes represents the nucleic acid profiles of the prepared probes and is ready to receive complementary cDNA or cRNA "targets" derived from experimental or clinical samples. This technique is used by research scientists around the world to produce "in-house" printed microarrays from their own labs; these arrays may be customized for each experiment, because researchers can choose the probes and printing locations on the arrays, synthesize the probes in their own lab, spot the arrays. They can generate their own labeled samples for hybridization, hybridize the samples to the array, scan the arrays with their own equipment.
This provides a low-cost microarray that may be customized for each study, avoids the costs of purchasing more expensive commercial arrays that may represent vast numbers of genes that are not of interest to the investigator. Publications exist which indicate in-house spotted microarrays may not provide the same level of sensitivity compared to commercial oligonucleotide arrays owing to the small batch sizes and reduced printing efficiencies when compared to industrial manufactures of oligo arrays. In oligonucleotide microarrays, the probes are short sequences designed to match parts of the sequence of known or predicted open reading frames. Although oligonucleotide probes are used in "spotted" microarrays, the term "oligonucleotide array" most refers to a specific technique of manufacturing. Oligonucleotide arrays are produced by printing short oligonucleotide sequences designed to represent a single gene or family of gene splice-variants by synthesizing this sequence directly onto the array surface instead of depositing intact sequences.
Sequences may be longer or shorter depending on the d
End-sequence profiling is a method based on sequence-tagged connectors developed to facilitate de novo genome sequencing to identify high-resolution copy number and structural aberrations such as inversions and translocations. The target genomic DNA is isolated and digested with restriction enzymes into large fragments. Following size-fractionation, the fragments are cloned into plasmids to construct artificial chromosomes such as bacterial artificial chromosomes which are sequenced and compared to the reference genome; the differences, including orientation and length variations between constructed chromosomes and the reference genome, will suggest copy number and structural aberration. Before analyzing target genome structural aberration and copy number variation with ESP, the target genome is amplified and conserved with artificial chromosome construction; the classic strategy to construct an artificial chromosome is bacterial artificial chromosome. The target chromosome is randomly digested and inserted into plasmids which are transformed and cloned in bacteria.
The size of fragments inserted is 150–350 kb. Another used artificial chromosome is fosmid; the difference between BAC and fosmids is the size of the DNA inserted. Fosmids can only hold 40 kb DNA fragments. End sequence profiling can be used to detect structural variations such as insertions and chromosomal rearrangement. Compare to other methods that look at chromosomal abnormalities, ESP is useful to identify copy neutral abnormalities such as inversions and translocations that would not be apparent when looking at copy number variation. From the BAC library, both ends of the inserted fragments are sequenced using a sequencing platform. Detection of variations is achieved by mapping the sequenced reads onto a reference genome. Inversions and translocations are easy to detect by an invalid pair of sequenced-end. For instance, a translocation can be detected if the paired-ends are mapped onto different chromosomes on the reference genome. Inversion can be detected by divergent orientation of the reads, where the insert will have two plus-end or two minus-end.
In the case of an insertion or a deletion, mapping of the paired-end is consistent with the reference genome. But the read are disconcordant in apparent size; the apparent size is the distance of the BAC sequenced-ends mapped in the reference genome. If a BAC has an insert of length, a concordant mapping will show a fragment of size in the reference genome. If the paired-ends are closer than distance, an insertion is suspected in the sampled DNA. A distance of can be used as a cut-off to detect an insertion, where µ is the mean length of the insert and σ is the standard deviation. In case of a deletion, the paired-ends are mapped further away in the reference genome compared to the expected distance. In some cases, discordant reads can indicate a CNV for example in sequences repeats. For larger CNV, the density of the reads will vary accordingly to the copy number. An increase of copy numbers will be reflected by increasing mapping of the same region on the reference genome. ESP was first developed and published in 2003 by Dr. Collins and his colleagues in University of California, San Francisco.
Their study revealed the chromosome rearrangements and CNV of MCF7 human cancer cells at a 150kb resolution, much more accurate compared to both CGH and spectral karyotyping at that time. In 2007, Dr. Snyder and his group improved the ESP to 3kb resolution by sequencing both pairs of 3-kb DNA fragments without BAC construction, their approach is able to identify deletions, insertions with an average breakpoint resolution of 644bp, which close to the resolution of polymerase chain reaction. Various bioinformatics tools can be used to analyze end-sequence profiling. Common ones include BreakDancer, PEMer, Variation Hunter, common LAW, GASV, Spanner. ESP can be used to map structural variation at high-resolution in disease tissue; this technique is used on tumor samples from different cancer types. Accurate identification of copy neutral chromosomal abnormalities is important as translocation can lead to fusion proteins, chimeric proteins, or misregulated proteins that can be seen in tumors; this technique can be used in evolution studies by identifying large structural variation between different populations.
Similar methods are being developed for various applications. For example, a barcoded Illumina paired-end sequencing approach was used to assess microbial diversity by sequencing the 16S V6 tag. Resolution of structural variation detection by ESP has been increased to the similar level as PCR, can be further improved by selection of more evenly sized DNA fragments. ESP can be applied for either without constructed artificial chromosome. With BAC, precious samples can be immortalized and conserved, important for small quantity of smalls which are planned for extensive analyses. Furthermore, BACs carrying rearranged DNA fragments can be directly transfected in vitro or in vivo to analyze the function of these arrangements. However, BAC construction is still labor-intensive. Researchers should be careful to choose which strategy they need for particular project; because ESP only looks at short paired-end sequences, it has the advantage of providing useful information genome-wide without the need for large-scale sequencing.
100-200 tumors can be sequenced at a resolution greater than 150kb when compared to sequencing an entire genome. Chromosome abnormalities Chromosomal inversion Insertion (genetic
In silico is an expression meaning "performed on computer or via computer simulation" in reference to biological experiments. The phrase was coined in 1989 as an allusion to the Latin phrases in vivo, in vitro, in situ, which are used in biology and refer to experiments done in living organisms, outside living organisms, where they are found in nature, respectively; the expression in silico was first used in public in 1989 in the workshop "Cellular Automata: Theory and Applications" in Los Alamos, New Mexico. Pedro Miramontes, a mathematician from National Autonomous University of Mexico presented the report "DNA and RNA Physicochemical Constraints, Cellular Automata and Molecular Evolution". In his talk, Miramontes used the term "in silico" to characterize biological experiments carried out in a computer; the work was presented by Miramontes as his PhD dissertation. In silico has been used in white papers written to support the creation of bacterial genome programs by the Commission of the European Community.
The first referenced paper where "in silico" appears was written by a French team in 1991. The first referenced book chapter where "in silico" appears was written by Hans B. Sieburg in 1990 and presented during a Summer School on Complex Systems at the Santa Fe Institute; the phrase "in silico" applied only to computer simulations that modeled natural or laboratory processes, did not refer to calculations done by computer generically. In silico study in medicine is thought to have the potential to speed the rate of discovery while reducing the need for expensive lab work and clinical trials. One way to achieve this is by screening drug candidates more effectively. In 2010, for example, using the protein docking algorithm EADock, researchers found potential inhibitors to an enzyme associated with cancer activity in silico. Fifty percent of the molecules were shown to be active inhibitors in vitro; this approach differs from use of expensive high-throughput screening robotic labs to physically test thousands of diverse compounds a day with an expected hit rate on the order of 1% or less with still fewer expected to be real leads following further testing.
Efforts have been made to establish computer models of cellular behavior. For example, in 2007 researchers developed an in silico model of tuberculosis to aid in drug discovery, with the prime benefit of its being faster than real time simulated growth rates, allowing phenomena of interest to be observed in minutes rather than months. More work can be found that focus on modeling a particular cellular process such as the growth cycle of Caulobacter crescentus; these efforts fall far short of an exact predictive, computer model of a cell's entire behavior. Limitations in the understanding of molecular dynamics and cell biology as well as the absence of available computer processing power force large simplifying assumptions that constrain the usefulness of present in silico cell models. Digital genetic sequences obtained from DNA sequencing may be stored in sequence databases, be analyzed, be digitally altered or be used as templates for creating new actual DNA using artificial gene synthesis.
In silico computer-based modeling technologies have been applied in: Whole cell analysis of prokaryotic and eukaryotic hosts e.g. E. coli, B. subtilis, yeast, CHO- or human cell lines Bioprocess development and optimization e.g. optimization of product yields Simulation of oncological clinical trials exploiting grid computing infrastructures, such as the European Grid Infrastructure, for improving the performance and effectiveness of the simulations. Analysis and visualization of heterologous data sets from various sources e.g. genome, transcriptome or proteome data Protein design. One example is a software package under development and free for academic use. World Wide Words: In silico CADASTER Seventh Framework Programme project aimed to develop in silico computational methods to minimize experimental tests for REACH Registration, Evaluation and Restriction of Chemicals In Silico Biology. Journal of Biological Systems Modeling and Simulation In Silico Pharmacology
The lac operon is an operon required for the transport and metabolism of lactose in Escherichia coli and many other enteric bacteria. Although glucose is the preferred carbon source for most bacteria, the lac operon allows for the effective digestion of lactose when glucose is not available through the activity of beta-galactosidase. Gene regulation of the lac operon was the first genetic regulatory mechanism to be understood so it has become a foremost example of prokaryotic gene regulation, it is discussed in introductory molecular and cellular biology classes for this reason. This lactose metabolism system was used by François Jacob and Jacques Monod to determine how a biological cell knows which enzyme to synthesize, their work on the lac operon won them the Nobel Prize in Physiology in 1965. Bacterial operons are polycistronic transcripts that are able to produce multiple proteins from one mRNA transcript. In this case, when lactose is required as a sugar source for the bacterium, the three genes of the lac operon can be expressed and their subsequent proteins translated: lacZ, lacY, lacA.
The gene product of lacZ is β-galactosidase which cleaves lactose, a disaccharide, into glucose and galactose. LacY encodes Beta-galactoside permease, a membrane protein which becomes embedded in the cytoplasmic membrane to enable the cellular transport of lactose into the cell. LacA encodes Galactoside acetyltransferase, it would be wasteful to produce enzymes when no lactose were available or if a more preferable energy source such as glucose were available. The lac operon uses a two-part control mechanism to ensure that the cell expends energy producing the enzymes encoded by the lac operon only when necessary. In the absence of lactose, the lac repressor, lacI, halts production of the enzymes encoded by the lac operon; the lac repressor is always expressed. In other words, it is transcribed only in the presence of small molecule co-inducer. In the presence of glucose, the catabolite activator protein, required for production of the enzymes, remains inactive, EIIAGlc shuts down lactose permease to prevent transport of lactose into the cell.
This dual control mechanism causes the sequential utilization of glucose and lactose in two distinct growth phases, known as diauxie. The lac operon consists of three structural genes, a promoter, a terminator, an operator; the three structural genes are: lacZ, lacY, lacA. lacZ encodes β-galactosidase, an intracellular enzyme that cleaves the disaccharide lactose into glucose and galactose. LacY encodes Beta-galactoside permease, a transmembrane symporter that pumps β-galactosides including lactose into the cell using a proton gradient in the same direction. Permease increases the permeability of the cell to β-galactosides. LacA encodes β-galactoside transacetylase, an enzyme that transfers an acetyl group from acetyl-CoA to β-galactosides. Only lacZ and lacY appear to be necessary for lactose catabolism. Three-letter abbreviations are used to describe phenotypes in bacteria including E. coli. Examples include: Lac, His Mot SmR In the case of Lac, wild type cells are Lac+ and are able to use lactose as a carbon and energy source, while Lac− mutant derivatives cannot use lactose.
The same three letters are used to label the genes involved in a particular phenotype, where each different gene is additionally distinguished by an extra letter. The lac genes encoding enzymes are lacZ, lacY, lacA; the fourth lac gene is lacI, encoding the lactose repressor—"I" stands for inducibility. One may distinguish between structural genes encoding enzymes, regulatory genes encoding proteins that affect gene expression. Current usage expands the phenotypic nomenclature to apply to proteins: thus, LacZ is the protein product of the lacZ gene, β-galactosidase. Various short sequences that are not genes affect gene expression, including the lac promoter, lac p, the lac operator, lac o. Although it is not standard usage, mutations affecting lac o are referred to as lac oc, for historical reasons. Specific control of the lac genes depends on the availability of the substrate lactose to the bacterium; the proteins are not produced by the bacterium. The lac genes are organized into an operon. Transcription of all genes starts with the binding of the enzyme RNA polymerase, a DNA-binding protein, which binds to a specific DNA binding site, the promoter upstream of the genes.
Binding of RNA polymerase to the promoter is aided by the cAMP-bound catabolite activator protein. However, the lacI gene produces a protein that blocks RNAP from binding to the promoter of the operon; this protein can only be removed when allolactose binds to it, inactivates it. The protein, formed by the lacI gene is known as the lac repressor; the type of regulation that the lac operon undergoes is referred to as negative inducible, meaning that the gene is turned off by the regulatory factor unless some molecule is added. Because of the presence of the lac repressor protein, genetic engineers who replace the lacZ gene with another gene will have to grow the experimental bacteria on agar with lactose available on it. If they do not, the gene they are trying to express will not be expressed as the repressor protein is still blocking RNAP fro
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule. DNA sequencing is the process of determining the nucleotide order of a given DNA fragment. So far, most DNA sequencing has been performed using the chain termination method developed by Frederick Sanger; this technique uses sequence-specific termination of a DNA synthesis reaction using modified nucleotide substrates. However, new sequencing technologies such as pyrosequencing are gaining an increasing share of the sequencing market. More genome data are now being produced by pyrosequencing than Sanger DNA sequencing. Pyrosequencing has enabled rapid genome sequencing. Bacterial genomes can be sequenced in a single run with several times coverage with this technique; this technique was used to sequence the genome of James Watson recently.
The sequence of DNA encodes the necessary information for living things to reproduce. Determining the sequence is therefore useful in fundamental research into why and how organisms live, as well as in applied subjects; because of the key importance DNA has to living things, knowledge of DNA sequences is useful in any area of biological research. For example, in medicine it can be used to identify and develop treatments for genetic diseases. Research into pathogens may lead to treatments for contagious diseases. Biotechnology is a burgeoning discipline, with the potential for services; the Carlson curve is a term coined by The Economist to describe the biotechnological equivalent of Moore's law, is named after author Rob Carlson. Carlson predicted the doubling time of DNA sequencing technologies would be at least as fast as Moore's law. Carlson curves illustrate the rapid decreases in cost, increases in performance, of a variety of technologies, including DNA sequencing, DNA synthesis, a range of physical and computational tools used in protein expression and in determining protein structures.
In chain terminator sequencing, extension is initiated at a specific site on the template DNA by using a short oligonucleotide'primer' complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases, along with a low concentration of a chain terminating nucleotide. Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used; the fragments are size-separated by electrophoresis in a slab polyacrylamide gel, or more now, in a narrow glass tube filled with a viscous polymer. An alternative to the labelling of the primer is to label the terminators instead called'dye terminator sequencing'; the major advantage of this approach is the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach.
This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks, due to a template dependent difference in the incorporation of the large dye chain-terminators; this problem has been reduced with the introduction of new enzymes and dyes that minimize incorporation variability. This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper; the major reason for this is that the primers do not have to be separately labelled, although this is less of a concern with used'universal' primers. This is changing due to the increasing cost-effectiveness of second- and third-generation systems from Illumina, 454, ABI, Dover. Pyrosequencing, developed by Pål Nyrén and Mostafa Ronaghi DNA, has been commercialized by Biotage and 454 Life Sciences; the latter platform sequences 100 megabases in a seven-hour run with a single machine.
In the array-based method, single-stranded DNA is annealed to beads and amplified via EmPCR. These DNA-bound beads are placed into wells on a fiber-optic chip along with enzymes which produce light in the presence of ATP; when free nucleotides are washed over this chip, light is produced as ATP is generated when nucleotides join with their complementary base pairs. Addition of one nucleotide results in a reaction that generates a light signal, recorded by the CCD camera in the instrument; the signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow. Whereas the methods above describe various sequencing methods, separate related terms are used when a large portion of a genome is sequenced. Several platforms were developed to perform whole genome sequencing. RNA is less stable in the cell, more prone to nuclease attack experimentally. A