DNA ligase is a specific type of enzyme, a ligase, that facilitates the joining of DNA strands together by catalyzing the formation of a phosphodiester bond. It plays a role in repairing single-strand breaks in duplex DNA in living organisms, but some forms may repair double-strand breaks. Single-strand breaks are repaired by DNA ligase using the complementary strand of the double helix as a template, with DNA ligase creating the final phosphodiester bond to repair the DNA. DNA ligase is used in both DNA DNA replication. In addition, DNA ligase has extensive use in molecular biology laboratories for recombinant DNA experiments. Purified DNA ligase is used in gene cloning to join DNA molecules together to form recombinant DNA; the mechanism of DNA ligase is to form two covalent phosphodiester bonds between 3' hydroxyl ends of one nucleotide, with the 5' phosphate end of another. Two ATP molecules are consumed for each phosphodiester bond formed. AMP is required for the ligase reaction, which proceeds in four steps: Reorganization of activity site such as nicks in DNA segments or Okazaki fragments etc.
Adenylation of a lysine residue in the active center of the enzyme, pyrophosphate is released. Ligase will work with blunt ends, although higher enzyme concentrations and different reaction conditions are required; the E. coli DNA ligase is encoded by the lig gene. DNA ligase in E. coli, as well as most prokaryotes, uses energy gained by cleaving nicotinamide adenine dinucleotide to create the phosphodiester bond. It does not ligate blunt-ended DNA except under conditions of molecular crowding with polyethylene glycol, cannot join RNA to DNA efficiently; the activity of E. coli DNA ligase can be enhanced by DNA polymerase at the right concentrations. Enhancement only works when the concentrations of the DNA polymerase 1 are much lower than the DNA fragments to be ligated; when the concentrations of Pol I DNA polymerases are higher, it has an adverse effect on E. coli DNA ligase The DNA ligase from bacteriophage T4. The T4 ligase is the most-commonly used in laboratory research, it can ligate either cohesive or blunt ends of DNA, oligonucleotides, as well as RNA and RNA-DNA hybrids, but not single-stranded nucleic acids.
It can ligate blunt-ended DNA with much greater efficiency than E. coli DNA ligase. Unlike E. coli DNA ligase, T4 DNA ligase cannot utilize NAD and it has an absolute requirement for ATP as a cofactor. Some engineering has been done to improve the in vitro activity of T4 DNA ligase. A typical reaction for inserting a fragment into a plasmid vector would use about 0.01 to 1 units of ligase. The optimal incubation temperature for T4 DNA ligase is 16 °C. In mammals, there are four specific types of ligase. DNA ligase I: ligates the nascent DNA of the lagging strand after the Ribonuclease H has removed the RNA primer from the Okazaki fragments. DNA ligase III: complexes with DNA repair protein XRCC1 to aid in sealing DNA during the process of nucleotide excision repair and recombinant fragments. Of the all known mammalian DNA ligases, only Lig III has been found to be present in mitochondria. DNA ligase IV: complexes with XRCC4, it catalyzes the final step in the non-homologous end joining DNA double-strand break repair pathway.
It is required for VJ recombination, the process that generates diversity in immunoglobulin and T-cell receptor loci during immune system development. DNA ligase II: appears to be used in repair, it is formed by alternative splicing of a proteolytic fragment of DNA ligase III and does not have its own gene, therefore it is considered to be identical to DNA ligase III. DNA ligase from eukaryotes and some microbes uses adenosine triphosphate rather than NAD. Derived from a thermophilic bacterium, the enzyme is stable and active at much higher temperatures than conventional DNA ligases, its half-life is 48 hours at 65°C and greater than 1 hour at 95°C. Ampligase DNA Ligase has been shown to be active for at least 500 thermal cycles or 16 hours of cycling.10 This exceptional thermostability permits high hybridization stringency and ligation specificity. There are at least three different units used to measure the activity of DNA ligase: Weiss unit - the amount of ligase that catalyzes the exchange of 1 nmole of 32P from inorganic pyrophosphate to ATP in 20 minutes at 37°C.
This is the one most used. Modrich-Lehman unit - this is used, one unit is defined as the amount of enzyme required to convert 100 nmoles of dn to an exonuclease-III resistant form in 30 minutes under standard conditions. Many commercial suppliers of ligases use an arbitrary unit based on the ability of ligase to ligate cohesive ends; these units are more subjective than quantitative and lack precision. DNA ligases have become indispensable tools in modern molecular biology research for generating recombinant DNA sequences. For example, DNA ligases are used with restriction enzymes to insert DNA fragments genes, into plasmids. Controlling the optimal temperature is a vital aspect of performing effi
Enzymes are macromolecular biological catalysts. Enzymes accelerate chemical reactions; the molecules upon which enzymes may act are called substrates and the enzyme converts the substrates into different molecules known as products. All metabolic processes in the cell need enzyme catalysis in order to occur at rates fast enough to sustain life. Metabolic pathways depend upon enzymes to catalyze individual steps; the study of enzymes is called enzymology and a new field of pseudoenzyme analysis has grown up, recognising that during evolution, some enzymes have lost the ability to carry out biological catalysis, reflected in their amino acid sequences and unusual'pseudocatalytic' properties. Enzymes are known to catalyze more than 5,000 biochemical reaction types. Most enzymes are proteins; the latter are called ribozymes. Enzymes' specificity comes from their unique three-dimensional structures. Like all catalysts, enzymes increase the reaction rate by lowering its activation energy; some enzymes can make their conversion of substrate to product occur many millions of times faster.
An extreme example is orotidine 5'-phosphate decarboxylase, which allows a reaction that would otherwise take millions of years to occur in milliseconds. Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, nor do they alter the equilibrium of a reaction. Enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules: inhibitors are molecules that decrease enzyme activity, activators are molecules that increase activity. Many therapeutic drugs and poisons are enzyme inhibitors. An enzyme's activity decreases markedly outside its optimal temperature and pH, many enzymes are denatured when exposed to excessive heat, losing their structure and catalytic properties; some enzymes are used commercially, in the synthesis of antibiotics. Some household products use enzymes to speed up chemical reactions: enzymes in biological washing powders break down protein, starch or fat stains on clothes, enzymes in meat tenderizer break down proteins into smaller molecules, making the meat easier to chew.
By the late 17th and early 18th centuries, the digestion of meat by stomach secretions and the conversion of starch to sugars by plant extracts and saliva were known but the mechanisms by which these occurred had not been identified. French chemist Anselme Payen was the first to discover an enzyme, diastase, in 1833. A few decades when studying the fermentation of sugar to alcohol by yeast, Louis Pasteur concluded that this fermentation was caused by a vital force contained within the yeast cells called "ferments", which were thought to function only within living organisms, he wrote that "alcoholic fermentation is an act correlated with the life and organization of the yeast cells, not with the death or putrefaction of the cells."In 1877, German physiologist Wilhelm Kühne first used the term enzyme, which comes from Greek ἔνζυμον, "leavened" or "in yeast", to describe this process. The word enzyme was used to refer to nonliving substances such as pepsin, the word ferment was used to refer to chemical activity produced by living organisms.
Eduard Buchner submitted his first paper on the study of yeast extracts in 1897. In a series of experiments at the University of Berlin, he found that sugar was fermented by yeast extracts when there were no living yeast cells in the mixture, he named the enzyme that brought about the fermentation of sucrose "zymase". In 1907, he received the Nobel Prize in Chemistry for "his discovery of cell-free fermentation". Following Buchner's example, enzymes are named according to the reaction they carry out: the suffix -ase is combined with the name of the substrate or to the type of reaction; the biochemical identity of enzymes was still unknown in the early 1900s. Many scientists observed that enzymatic activity was associated with proteins, but others argued that proteins were carriers for the true enzymes and that proteins per se were incapable of catalysis. In 1926, James B. Sumner crystallized it; the conclusion that pure proteins can be enzymes was definitively demonstrated by John Howard Northrop and Wendell Meredith Stanley, who worked on the digestive enzymes pepsin and chymotrypsin.
These three scientists were awarded the 1946 Nobel Prize in Chemistry. The discovery that enzymes could be crystallized allowed their structures to be solved by x-ray crystallography; this was first done for lysozyme, an enzyme found in tears and egg whites that digests the coating of some bacteria. This high-resolution structure of lysozyme marked the beginning of the field of structural biology and the effort to understand how enzymes work at an atomic level of detail. An enzyme's name is derived from its substrate or the chemical reaction it catalyzes, with the word ending in -ase. Examples are alcohol dehydrogenase and DNA polymerase. Different enzymes that catalyze the same chemical reaction are called isozymes; the International Union of Biochemistry and Molecular Biology have developed a nomenclature for enzymes, the EC numbers. The first number broadly classifies the enzyme based on its mechanism; the top-level classification is: EC 1, Oxidoreductases: catalyze oxidation/reducti
In silico PCR
In silico PCR refers to computational tools used to calculate theoretical polymerase chain reaction results using a given set of primers to amplify DNA sequences from a sequenced genome or transcriptome. These tools are used to optimize the design of primers for target cDNA sequences. Primer optimization has two goals: selectivity. Efficiency involves taking into account such factors as GC-content, efficiency of binding, secondary structure, annealing and melting point. Primer selectivity requires that the primer pairs not fortuitously bind to random sites other than the target of interest, nor should the primer pairs bind to conserved regions of a gene family. If the selectivity is poor, a set of primers will amplify multiple products besides the target of interest; the design of appropriate short or long primer pairs is only one goal of PCR product prediction. Other information provided by in silico PCR tools may include determining primer location, length of each amplicon, simulation of electrophoretic mobility, identification of open reading frames, links to other web resources.
Many software packages are available offering differing balances of feature set, ease of use and cost. The most used would be e-PCR accessible from the National Center for Biotechnology Information website. On the other hand, FastPCR, a commercial application, allows simultaneous testing of a single primer or a set of primers designed for multiplex target sequences, it performs a fast, gapless alignment to test the complementarity of the primers to the target sequences. Probable PCR products can be found for linear and circular templates using standard or inverse PCR as well as for multiplex PCR. VPCR runs a dynamic simulation of multiplex PCR, allowing for an estimate of quantitative competition effects between multiple amplicons in one reaction; the UCSC Genome Browser offers isPCR, which provides graphical as well text-file output to view PCR products on more than 100 sequenced genomes. A primer may bind to many predicted sequences, but only sequences with no or few mismatches at the 3' end of the primer can be used for polymerase extension.
The last 10-12 bases at the 3' end of a primer are sensitive to initiation of polymerase extension and general primer stability on the template binding site. The effect of a single mismatch at these last 10 bases at the 3' end of the primer depends on its position and local structure, reducing the primer binding, PCR efficiency. Webtools for PCR, qPCR, in silico PCR and oligonucleotides Electronic PCR — web server provided by NCBI In silico simulation of molecular biology experiments csPCR: A computational tool for the simulation of the Polymerase Chain Reaction WebPCR PCR simulation allowing tailed primers and inverse PCR on a circular template
Deoxyribonucleic acid is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development and reproduction of all known organisms and many viruses. DNA and ribonucleic acid are nucleic acids; the two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases, a sugar called deoxyribose, a phosphate group; the nucleotides are joined to one another in a chain by covalent bonds between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules, with hydrogen bonds to make double-stranded DNA; the complementary nitrogenous bases are divided into two groups and purines. In DNA, the pyrimidines are cytosine. Both strands of double-stranded DNA store the same biological information.
This information is replicated as and when the two strands separate. A large part of DNA is non-coding, meaning that these sections do not serve as patterns for protein sequences; the two strands of DNA are thus antiparallel. Attached to each sugar is one of four types of nucleobases, it is the sequence of these four nucleobases along the backbone. RNA strands are created using DNA strands as a template in a process called transcription. Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation. Within eukaryotic cells, DNA is organized into long structures called chromosomes. Before typical cell division, these chromosomes are duplicated in the process of DNA replication, providing a complete set of chromosomes for each daughter cell. Eukaryotic organisms store most of their DNA inside the cell nucleus as nuclear DNA, some in the mitochondria as mitochondrial DNA, or in chloroplasts as chloroplast DNA. In contrast, prokaryotes store their DNA only in circular chromosomes.
Within eukaryotic chromosomes, chromatin proteins, such as histones and organize DNA. These compacting structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed. DNA was first isolated by Friedrich Miescher in 1869, its molecular structure was first identified by Francis Crick and James Watson at the Cavendish Laboratory within the University of Cambridge in 1953, whose model-building efforts were guided by X-ray diffraction data acquired by Raymond Gosling, a post-graduate student of Rosalind Franklin. DNA is used by researchers as a molecular tool to explore physical laws and theories, such as the ergodic theorem and the theory of elasticity; the unique material properties of DNA have made it an attractive molecule for material scientists and engineers interested in micro- and nano-fabrication. Among notable advances in this field are DNA origami and DNA-based hybrid materials. DNA is a long polymer made from repeating units called nucleotides.
The structure of DNA is dynamic along its length, being capable of coiling into tight loops and other shapes. In all species it is composed of two helical chains, bound to each other by hydrogen bonds. Both chains are coiled around the same axis, have the same pitch of 34 angstroms; the pair of chains has a radius of 10 angstroms. According to another study, when measured in a different solution, the DNA chain measured 22 to 26 angstroms wide, one nucleotide unit measured 3.3 Å long. Although each individual nucleotide is small, a DNA polymer can be large and contain hundreds of millions, such as in chromosome 1. Chromosome 1 is the largest human chromosome with 220 million base pairs, would be 85 mm long if straightened. DNA does not exist as a single strand, but instead as a pair of strands that are held together; these two long strands coil in the shape of a double helix. The nucleotide contains both a segment of the backbone of a nucleobase. A nucleobase linked to a sugar is called a nucleoside, a base linked to a sugar and to one or more phosphate groups is called a nucleotide.
A biopolymer comprising multiple linked nucleotides is called a polynucleotide. The backbone of the DNA strand is made from alternating sugar residues; the sugar in DNA is 2-deoxyribose, a pentose sugar. The sugars are joined together by phosphate groups that form phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings; these are known as the 3′-end, 5′-end carbons, the prime symbol being used to distinguish these carbon atoms from those of the base to which the deoxyribose forms a glycosidic bond. When imagining DNA, each phosphoryl is considered to "belong" to the nucleotide whose 5′ carbon forms a bond therewith. Any DNA strand therefore has one end at which there is a phosphoryl attached to the 5′ carbon of a ribose and another end a
In biology, a gene is a sequence of nucleotides in DNA or RNA that codes for a molecule that has a function. During gene expression, the DNA is first copied into RNA; the RNA can be directly functional or be the intermediate template for a protein that performs a function. The transmission of genes to an organism's offspring is the basis of the inheritance of phenotypic trait; these genes make up different DNA sequences called genotypes. Genotypes along with developmental factors determine what the phenotypes will be. Most biological traits are under the influence of polygenes as well as gene–environment interactions; some genetic traits are visible, such as eye color or number of limbs, some are not, such as blood type, risk for specific diseases, or the thousands of basic biochemical processes that constitute life. Genes can acquire mutations in their sequence, leading to different variants, known as alleles, in the population; these alleles encode different versions of a protein, which cause different phenotypical traits.
Usage of the term "having a gene" refers to containing a different allele of the same, shared gene. Genes evolve due to natural selection / survival of the fittest and genetic drift of the alleles; the concept of a gene continues to be refined. For example, regulatory regions of a gene can be far removed from its coding regions, coding regions can be split into several exons; some viruses store their genome in RNA instead of DNA and some gene products are functional non-coding RNAs. Therefore, a broad, modern working definition of a gene is any discrete locus of heritable, genomic sequence which affect an organism's traits by being expressed as a functional product or by regulation of gene expression; the term gene was introduced by Danish botanist, plant physiologist and geneticist Wilhelm Johannsen in 1909. It is inspired by the ancient Greek: γόνος, that means offspring and procreation; the existence of discrete inheritable units was first suggested by Gregor Mendel. From 1857 to 1864, in Brno, he studied inheritance patterns in 8000 common edible pea plants, tracking distinct traits from parent to offspring.
He described these mathematically as 2n combinations where n is the number of differing characteristics in the original peas. Although he did not use the term gene, he explained his results in terms of discrete inherited units that give rise to observable physical characteristics; this description prefigured Wilhelm Johannsen's distinction between phenotype. Mendel was the first to demonstrate independent assortment, the distinction between dominant and recessive traits, the distinction between a heterozygote and homozygote, the phenomenon of discontinuous inheritance. Prior to Mendel's work, the dominant theory of heredity was one of blending inheritance, which suggested that each parent contributed fluids to the fertilisation process and that the traits of the parents blended and mixed to produce the offspring. Charles Darwin developed a theory of inheritance he termed pangenesis, from Greek pan and genesis / genos. Darwin used the term gemmule to describe hypothetical particles. Mendel's work went unnoticed after its first publication in 1866, but was rediscovered in the late 19th century by Hugo de Vries, Carl Correns, Erich von Tschermak, who reached similar conclusions in their own research.
In 1889, Hugo de Vries published his book Intracellular Pangenesis, in which he postulated that different characters have individual hereditary carriers and that inheritance of specific traits in organisms comes in particles. De Vries called these units "pangenes", after Darwin's 1868 pangenesis theory. Sixteen years in 1905, Wilhelm Johannsen introduced the term'gene' and William Bateson that of'genetics' while Eduard Strasburger, amongst others, still used the term'pangene' for the fundamental physical and functional unit of heredity. Advances in understanding genes and inheritance continued throughout the 20th century. Deoxyribonucleic acid was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s; the structure of DNA was studied by Rosalind Franklin and Maurice Wilkins using X-ray crystallography, which led James D. Watson and Francis Crick to publish a model of the double-stranded DNA molecule whose paired nucleotide bases indicated a compelling hypothesis for the mechanism of genetic replication.
In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities, indivisible by recombination and arranged like beads on a string. The experiments of Benzer using mutants defective in the rII region of bacteriophage T4 showed that individual genes have a simple linear structure and are to be equivalent to a linear section of DNA. Collectively, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, transcribed from DNA; this dogma has since been shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics. In 1972, Walter Fiers and his team were the first to determine the sequence of a gene: that of Bacteriophage MS2 coat protein; the subsequent development of chain-termination DNA sequencing in 1977 by Frederick Sanger improved the efficiency of sequencing and turned it into a routine laboratory tool.
An automated version of the Sanger method was used in early phases of the
In biology, an organism is any individual entity that exhibits the properties of life. It is a synonym for "life form". Organisms are classified by taxonomy into specified groups such as the multicellular animals and fungi. All types of organisms are capable of reproduction and development, some degree of response to stimuli. Humans are multicellular animals composed of many trillions of cells which differentiate during development into specialized tissues and organs. An organism may be either a eukaryote. Prokaryotes are represented by two separate domains -- archaea. Eukaryotic organisms are characterized by the presence of a membrane-bound cell nucleus and contain additional membrane-bound compartments called organelles. Fungi and plants are examples of kingdoms of organisms within the eukaryotes. Estimates on the number of Earth's current species range from 10 million to 14 million, of which only about 1.2 million have been documented. More than 99% of all species, amounting to over five billion species, that lived are estimated to be extinct.
In 2016, a set of 355 genes from the last universal common ancestor of all organisms was identified. The term "organism" first appeared in the English language in 1703 and took on its current definition by 1834, it is directly related to the term "organization". There is a long tradition of defining organisms as self-organizing beings, going back at least to Immanuel Kant's 1790 Critique of Judgment. An organism may be defined as an assembly of molecules functioning as a more or less stable whole that exhibits the properties of life. Dictionary definitions can be broad, using phrases such as "any living structure, such as a plant, fungus or bacterium, capable of growth and reproduction". Many definitions exclude viruses and possible man-made non-organic life forms, as viruses are dependent on the biochemical machinery of a host cell for reproduction. A superorganism is an organism consisting of many individuals working together as a single functional or social unit. There has been controversy about the best way to define the organism and indeed about whether or not such a definition is necessary.
Several contributions are responses to the suggestion that the category of "organism" may well not be adequate in biology. Viruses are not considered to be organisms because they are incapable of autonomous reproduction, growth or metabolism; this controversy is problematic because some cellular organisms are incapable of independent survival and live as obligatory intracellular parasites. Although viruses have a few enzymes and molecules characteristic of living organisms, they have no metabolism of their own; this rules out autonomous reproduction: they can only be passively replicated by the machinery of the host cell. In this sense, they are similar to inanimate matter. While viruses sustain no independent metabolism and thus are not classified as organisms, they do have their own genes, they do evolve by mechanisms similar to the evolutionary mechanisms of organisms; the most common argument in support of viruses as living organisms is their ability to undergo evolution and replicate through self-assembly.
Some scientists argue. In fact, viruses are evolved by their host cells, meaning that there was co-evolution of viruses and host cells. If host cells did not exist, viral evolution would be impossible; this is not true for cells. If viruses did not exist, the direction of cellular evolution could be different, but cells would be able to evolve; as for the reproduction, viruses rely on hosts' machinery to replicate. The discovery of viral metagenomes with genes coding for energy metabolism and protein synthesis fueled the debate about whether viruses belong in the tree of life; the presence of these genes suggested. However, it was found that the genes coding for energy and protein metabolism have a cellular origin. Most these genes were acquired through horizontal gene transfer from viral hosts. Organisms are complex chemical systems, organized in ways that promote reproduction and some measure of sustainability or survival; the same laws that govern non-living chemistry govern the chemical processes of life.
It is the phenomena of entire organisms that determine their fitness to an environment and therefore the survivability of their DNA-based genes. Organisms owe their origin and many other internal functions to chemical phenomena the chemistry of large organic molecules. Organisms are complex systems of chemical compounds that, through interaction and environment, play a wide variety of roles. Organisms are semi-closed chemical systems. Although they are individual units of life, they are not closed to the environment around them. To operate they take in and release energy. Autotrophs produce usable energy using light from the sun or inorganic compounds while heterotrophs take in organic compounds from the environment; the primary chemical element in these compounds is carbon. The chemical properties of this element such as its grea
A base pair is a unit consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, Watson–Crick base pairs allow the DNA helix to maintain a regular helical structure, subtly dependent on its nucleotide sequence; the complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes. Intramolecular base pairs can occur within single-stranded nucleic acids.
This is important in RNA molecules, where Watson–Crick base pairs permit the formation of short double-stranded helices, a wide variety of non-Watson–Crick interactions allow RNAs to fold into a vast range of specific three-dimensional structures. In addition, base-pairing between transfer RNA and messenger RNA forms the basis for the molecular recognition events that result in the nucleotide sequence of mRNA becoming translated into the amino acid sequence of proteins via the genetic code; the size of an individual gene or an organism's entire genome is measured in base pairs because DNA is double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands; the haploid human genome is estimated to be about 3.2 billion bases long and to contain 20,000–25,000 distinct protein-coding genes. A kilobase is a unit of measurement in molecular biology equal to 1000 base pairs of DNA or RNA; the total amount of related DNA base pairs on Earth is estimated at 5.0×1037 and weighs 50 billion tonnes.
In comparison, the total mass of the biosphere has been estimated to be as much as 4 TtC. Hydrogen bonding is the chemical interaction. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content. But, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly; the larger nucleobases and guanine, are members of a class of double-ringed chemical structures called purines. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established. Purine-pyrimidine base-pairing of AT or GC or UA results in proper duplex structure; the only other purine-pyrimidine pairings would be AC and GT and UG. The GU pairing, with two hydrogen bonds, does occur often in RNA. Paired DNA and RNA molecules are comparatively stable at room temperature, but the two nucleotide strands will separate above a melting point, determined by the length of the molecules, the extent of mispairing, the GC content.
Higher GC content results in higher melting temperatures. On the converse, regions of a genome that need to separate — for example, the promoter regions for often-transcribed genes — are comparatively GC-poor. GC content and melting temperature must be taken into account when designing primers for PCR reactions; the following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5' end to the 3' end. A base-paired DNA sequence: ATCGATTGAGCTCTAGCG TAGCTAACTCGAGATCGCThe corresponding RNA sequence, in which uracil is substituted for thymine in the RNA strand: AUCGAUUGAGCUCUAGCG UAGCUAACUCGAGAUCGC Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors in DNA replication and DNA transcription; this is due to their isosteric chemistry. One common mutagenic base analog is 5-bromouracil, which resembles thymine but can base-pair to guanine in its enol form. Other chemicals, known as DNA intercalators, fit into the gap between adjacent bases on a single strand and induce frameshift mutations by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site.
Most intercalators are known or suspected carcinogens. Examples include ethidium acridine. An unnatural base pair is a designed subunit of DNA, created in a laboratory and does not occur in nature. DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two ba