Nucleic acid double helix
In molecular biology, the term double helix refers to the structure formed by double-stranded molecules of nucleic acids such as DNA. The double helical structure of a nucleic acid complex arises as a consequence of its secondary structure, is a fundamental component in determining its tertiary structure; the term entered popular culture with the publication in 1968 of The Double Helix: A Personal Account of the Discovery of the Structure of DNA, by James Watson. The DNA double helix biopolymer of nucleic acid, held together by nucleotides which base pair together. In B-DNA, the most common double helical structure found in nature, the double helix is right-handed with about 10–10.5 base pairs per turn. The double helix structure of DNA contains a major minor groove. In B-DNA the major groove is wider than the minor groove. Given the difference in widths of the major groove and minor groove, many proteins which bind to B-DNA do so through the wider major groove; the double-helix model of DNA structure was first published in the journal Nature by James Watson and Francis Crick in 1953, based upon the crucial X-ray diffraction image of DNA labeled as "Photo 51", from Rosalind Franklin in 1952, followed by her more clarified DNA image with Raymond Gosling, Maurice Wilkins, Alexander Stokes, Herbert Wilson, base-pairing chemical and biochemical information by Erwin Chargaff.
The prior model was triple-stranded DNA. The realization that the structure of DNA is that of a double-helix elucidated the mechanism of base pairing by which genetic information is stored and copied in living organisms and is considered one of the most important scientific discoveries of the 20th century. Crick and Watson each received one third of the 1962 Nobel Prize in Physiology or Medicine for their contributions to the discovery. Hybridization is the process of complementary base pairs binding to form a double helix. Melting is the process by which the interactions between the strands of the double helix are broken, separating the two nucleic acid strands; these bonds are weak separated by gentle heating, enzymes, or mechanical force. Melting occurs preferentially at certain points in the nucleic acid. T and A rich regions are more melted than C and G rich regions; some base steps are susceptible to DNA melting, such as T A and T G. These mechanical features are reflected by the use of sequences such as TATA at the start of many genes to assist RNA polymerase in melting the DNA for transcription.
Strand separation by gentle heating, as used in polymerase chain reaction, is simple, providing the molecules have fewer than about 10,000 base pairs. The intertwining of the DNA strands makes long segments difficult to separate; the cell avoids this problem by allowing its DNA-melting enzymes to work concurrently with topoisomerases, which can chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other. Helicases unwind the strands to facilitate the advance of sequence-reading enzymes such as DNA polymerase; the geometry of a base, or base pair step can be characterized by 6 coordinates: shift, rise, tilt and twist. These values define the location and orientation in space of every base or base pair in a nucleic acid molecule relative to its predecessor along the axis of the helix. Together, they characterize the helical structure of the molecule. In regions of DNA or RNA where the normal structure is disrupted, the change in these values can be used to describe such disruption.
For each base pair, considered relative to its predecessor, there are the following base pair geometries to consider: Shear Stretch Stagger Buckle Propeller: rotation of one base with respect to the other in the same base pair. Opening Shift: displacement along an axis in the base-pair plane perpendicular to the first, directed from the minor to the major groove. Slide: displacement along an axis in the plane of the base pair directed from one strand to the other. Rise: displacement along the helix axis. Tilt: rotation around the shift axis. Roll: rotation around the slide axis. Twist: rotation around the rise axis. X-displacement y-displacement inclination tip pitch: the number of base pairs per complete turn of the helix. Rise and twist determine the pitch of the helix; the other coordinates, by contrast, can be zero. Slide and shift are small in B-DNA, but are substantial in A- and Z-DNA. Roll and tilt make successive base pairs less parallel, are small. Note that "tilt" has been used differently in the scientific literature, referring to the deviation of the first, inter-strand base-pair axis from perpendicularity to the helix axis.
This corresponds to slide between a succession of base pairs, in helix-based coordinates is properly termed "inclination". At least three DNA conformations are believed to be found in nature, A-DNA, B-DNA, Z-DNA; the B form described by James Watson and Francis Crick is believed to predominate in cells. It extends 34 Å per 10 bp of sequence; the double helix makes one complete turn about its axis every 10.4–10.5 base pairs in solution. This frequency of twist depends on stacking forces that each base exerts on its neighbours in the chain; the absolute configuration of the bases determines the direction of the helical curve for a given conformation. A-DNA and Z-DNA differ in their geometry and dimensions to B-DNA, although still form helical structures, it was long thought that the A form only occurs in dehydrated samples of
A DNA-binding domain is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA; some DNA-binding domains may include nucleic acids in their folded structure. One or more DNA-binding domains are part of a larger protein consisting of further protein domains with differing function; the extra domains regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involves transcription regulation, with the two roles sometimes overlapping. DNA-binding domains with functions involving DNA structure have biological roles in DNA replication, repair and modification, such as methylation. Many proteins involved in the regulation of gene expression contain DNA-binding domains. For example, proteins that regulate transcription by binding DNA are called transcription factors; the final output of most cellular signaling cascades is gene regulation.
The DBD interacts with the nucleotides of DNA in a DNA sequence-specific or non-sequence-specific manner, but non-sequence-specific recognition involves some sort of molecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone; each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting enzyme DNAse I cuts DNA randomly and so must bind to DNA in a non-sequence-specific manner, but so, DNAse I recognizes a certain 3-D DNA structure, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting. Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of transcription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, like restriction enzymes and telomerase; the hydrogen bonding pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for sequence-specific DNA recognition.
The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as gel electrophoresis, analytical ultracentrifugation, calorimetry, DNA mutation, protein structure mutation or modification, nuclear magnetic resonance, x-ray crystallography, surface plasmon resonance, electron paramagnetic resonance, cross-linking and microscale thermophoresis. A large fraction of genes in each genome encodes DNA-binding proteins. However, only a rather small number of protein families are DNA-binding. For instance, more than 2000 of the ~20,000 human proteins are "DNA-binding", including about 750 Zinc-finger proteins. Discovered in bacteria, the helix-turn-helix motif is found in repressor proteins and is about 20 amino acids long. In eukaryotes, the homeodomain comprises 2 helices, one of which recognizes the DNA, they are common in proteins. The zinc finger domain is found in eukaryotes, but some examples have been found in bacteria; the zinc finger domain is between 23 and 28 amino acids long and is stabilized by coordinating zinc ions with spaced zinc-coordinating residues.
The most common class of zinc finger coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet. In transcription factors these domains are found in arrays and adjacent fingers are spaced at 3 basepair intervals when bound to DNA; the basic leucine zipper domain is found in eukaryotes and to a limited extent in bacteria. The bZIP domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins; when binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression. Consisting of about 110 amino acids, the winged helix domain has four helices and a two-strand beta-sheet; the winged helix-turn-helix domain SCOP 46785 is 85-90 amino acids long. It is formed by a 4-strand beta-sheet; the basic helix-loop-helix domain is found in some transcription factors and is characterized by two alpha helices connected by a loop.
One helix is smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix contains the DNA-binding regions. HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription, they alter the flexibility of the DNA by inducing bends. The domain consists of three alpha helices separated by loops. Wor3 domains, named after the White–Opaque Regulator 3 in Candida albicans arose more in evolutionary time than most described DNA-binding domains and are restricted to a small number of fungi; the OB-fold is a small structural motif named for its oligonucleotide/oligosaccharide binding properties. OB-fold domains range between 150 amino acids in length. OB-folds bind single-stranded DNA, hence are single-stranded binding proteins. OB-fold proteins have been identified as critical for DNA replication, DNA recombination, DNA repair, translation, cold shock response, telomere maintenance.
The immunoglobulin domain consis
A leucine zipper is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a characteristic 30-amino acid segment and the display of these amino acid sequences on an idealized alpha helix revealed a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns; the polypeptide segments containing these periodic arrays of leucine residues were proposed to exist in an alpha-helical conformation and the leucine side chains from one alpha helix interdigitate with those from the alpha helix of a second polypeptide, facilitating dimerization. Leucine zippers are a dimerization domain of the bZIP class of eukaryotic transcription factors; the bZIP domain is 60 to 80 amino acids in length with a conserved DNA binding basic region and a more diversified leucine zipper dimerization region. The leucine zipper is a common three-dimensional structural motif in proteins and it has that name because leucines occur every seven amino acids in the dimerization domain.
The localization of the leucines are critical for the DNA binding to the proteins. Leucine zippers are present in both eukaryotic and prokaryotic regulatory proteins, but are a feature of eukaryotes, they can be annotated as ZIPs, ZIP-like motifs have been found in proteins other than transcription factors and are thought to be one of the general protein modules for protein–protein interactions. The mechanism of transcriptional regulation by bZIP proteins has been studied in detail. Most bZIP proteins show high binding affinity for the ACGT motifs, which include CACGTG, GACGTC, TACGTA, AACGTT, a GCN4 motif, namely TGATCA. A small number of bZIP factors such as OsOBF1 can recognize palindromic sequences. However, the others, including LIP19, OsZIP-2a, OsZIP-2b, do not bind to DNA sequences. Instead, these bZIP proteins form heterodimers with other bZIPs to regulate transcriptional activities. Leucine zipper is created by the dimerization of two specific alpha helix monomers bound to DNA; the bZIP interacts with the DNA via basic, amine residues (see basic amino acids in of certain amino acids in the "basic" domain, such as lysines and arginines.
These basic residues interact in the major groove of the DNA, forming sequence-specific interactions. The leucine zipper is formed by amphipathic interaction between two ZIP domains; the ZIP domain is found in the alpha-helix of each monomer, contains leucines, or leucine-like amino acids. These amino acids are spaced out in each region's polypeptide sequence in such a way that when the sequence is coiled in a 3D alpha-helix, the leucine residues line up on the same side of the helix; this region of the alpha-helix- containing the leucines which line up- is called a ZIP domain, leucines from each ZIP domain can weakly interact with leucines from other ZIP domains, reversibly holding their alpha-helices together. When these alpha helices dimerize, the zipper is formed; the hydrophobic side of the helix forms a dimer with itself or another similar helix, burying the non-polar amino acids away from the solvent. The hydrophilic side of the helix interacts with the water in the solvent. Leucine zipper domains are considered a subtype of coiled coils, which are built by two or more alpha helices that are wound around each other to form a supercoil.
Coiled coils contain 3- and 4-residue repeats whose hydrophobicity pattern and residue composition is compatible with the structure of amphipathic alpha-helices. The alternating three- and four-residue sequence elements constitute heptad repeats in which the amino acids are designated from a’ to g’, whereas residues in positions a and d are hydrophobic and form a zigzag pattern of knobs and holes that interlock with a similar pattern on another strand to form a tight-fitting hydrophobic core. In the case of leucine zippers, leucines are predominant at the d position of the heptad repeat; these residues pack against each other every second turn of the alpha-helices, the hydrophobic region between two helices is completed by residues at the a positions, which are frequently hydrophobic. They are referred to as coiled coils. If, the case they are annotated in the “domain” subsection, which would be the bZIP domain. Leucine zipper regulatory proteins include c-fos and c-jun, important regulators of normal development, as well as myc family members including myc and mxd1.
If they are overproduced or mutated in a vital area, they may generate cancer. Leucine+zippers at the US National Library of Medicine Medical Subject Headings
Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. Both DNA and RNA are nucleic acids. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary, antiparallel RNA strand called a primary transcript. Transcription proceeds in the following general steps: RNA polymerase, together with one or more general transcription factors, binds to promoter DNA. RNA polymerase creates a transcription bubble; this is done by breaking the hydrogen bonds between complementary DNA nucleotides. RNA polymerase adds RNA nucleotides. RNA sugar-phosphate backbone forms with assistance from RNA polymerase to form an RNA strand. Hydrogen bonds of the RNA–DNA helix break, freeing the newly synthesized RNA strand. If the cell has a nucleus, the RNA may be further processed; this may include polyadenylation and splicing. The RNA may exit to the cytoplasm through the nuclear pore complex; the stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene.
If the gene encodes a protein, the transcription produces messenger RNA. Alternatively, the transcribed gene may encode for non-coding RNA such as microRNA, ribosomal RNA, transfer RNA, or enzymatic RNA molecules called ribozymes. Overall, RNA helps synthesize and process proteins. In virology, the term may be used when referring to mRNA synthesis from an RNA molecule. For instance, the genome of a negative-sense single-stranded RNA virus may be template for a positive-sense single-stranded RNA; this is because the positive-sense strand contains the information needed to translate the viral proteins for viral replication afterwards. This process is catalyzed by a viral RNA replicase. A DNA transcription unit encoding for a protein may contain both a coding sequence, which will be translated into the protein, regulatory sequences, which direct and regulate the synthesis of that protein; the regulatory sequence before the coding sequence is called the five prime untranslated region. As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil in all instances where thymine would have occurred in a DNA complement.
Only one of the two DNA strands serve as a template for transcription. The antisense strand of DNA is read by RNA polymerase from the 3' end to the 5' end during transcription; the complementary RNA is created in the opposite direction, in the 5' → 3' direction, matching the sequence of the sense strand with the exception of switching uracil for thymine. This directionality is because RNA polymerase can only add nucleotides to the 3' end of the growing mRNA chain; this use of only the 3' → 5' DNA strand eliminates the need for the Okazaki fragments that are seen in DNA replication. This removes the need for an RNA primer to initiate RNA synthesis, as is the case in DNA replication; the non-template strand of DNA is called the coding strand, because its sequence is the same as the newly created RNA transcript. This is the strand, used by convention when presenting a DNA sequence. Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA.
As a result, transcription has a lower copying fidelity than DNA replication. Transcription is divided into initiation, promoter escape and termination. Transcription begins with the binding of RNA polymerase, together with one or more general transcription factors, to a specific DNA sequence referred to as a "promoter" to form an RNA polymerase-promoter "closed complex". In the "closed complex" the promoter DNA is still double-stranded. RNA polymerase, assisted by one or more general transcription factors unwinds 14 base pairs of DNA to form an RNA polymerase-promoter "open complex". In the "open complex" the promoter DNA is unwound and single-stranded; the exposed, single-stranded DNA is referred to as the "transcription bubble."RNA polymerase, assisted by one or more general transcription factors selects a transcription start site in the transcription bubble, binds to an initiating NTP and an extending NTP complementary to the transcription start site sequence, catalyzes bond formation to yield an initial RNA product.
In bacteria, RNA polymerase holoenzyme consists of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, 1 ω subunit. In bacteria, there is one general RNA transcription factor: sigma. RNA polymerase core enzyme binds to the bacterial general transcription factor sigma to form RNA polymerase holoenzyme and binds to a promoter. In archaea and eukaryotes, RNA polymerase contains subunits homologous to each of the five RNA polymerase subunits in bacteria and contains additional subunits. In archaea and eukaryotes, the functions of the bacterial general transcription factor sigma are performed by multiple general transcription factors that work together. In archaea, there ar
Deoxyribonucleic acid is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development and reproduction of all known organisms and many viruses. DNA and ribonucleic acid are nucleic acids; the two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases, a sugar called deoxyribose, a phosphate group; the nucleotides are joined to one another in a chain by covalent bonds between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules, with hydrogen bonds to make double-stranded DNA; the complementary nitrogenous bases are divided into two groups and purines. In DNA, the pyrimidines are cytosine. Both strands of double-stranded DNA store the same biological information.
This information is replicated as and when the two strands separate. A large part of DNA is non-coding, meaning that these sections do not serve as patterns for protein sequences; the two strands of DNA are thus antiparallel. Attached to each sugar is one of four types of nucleobases, it is the sequence of these four nucleobases along the backbone. RNA strands are created using DNA strands as a template in a process called transcription. Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation. Within eukaryotic cells, DNA is organized into long structures called chromosomes. Before typical cell division, these chromosomes are duplicated in the process of DNA replication, providing a complete set of chromosomes for each daughter cell. Eukaryotic organisms store most of their DNA inside the cell nucleus as nuclear DNA, some in the mitochondria as mitochondrial DNA, or in chloroplasts as chloroplast DNA. In contrast, prokaryotes store their DNA only in circular chromosomes.
Within eukaryotic chromosomes, chromatin proteins, such as histones and organize DNA. These compacting structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed. DNA was first isolated by Friedrich Miescher in 1869, its molecular structure was first identified by Francis Crick and James Watson at the Cavendish Laboratory within the University of Cambridge in 1953, whose model-building efforts were guided by X-ray diffraction data acquired by Raymond Gosling, a post-graduate student of Rosalind Franklin. DNA is used by researchers as a molecular tool to explore physical laws and theories, such as the ergodic theorem and the theory of elasticity; the unique material properties of DNA have made it an attractive molecule for material scientists and engineers interested in micro- and nano-fabrication. Among notable advances in this field are DNA origami and DNA-based hybrid materials. DNA is a long polymer made from repeating units called nucleotides.
The structure of DNA is dynamic along its length, being capable of coiling into tight loops and other shapes. In all species it is composed of two helical chains, bound to each other by hydrogen bonds. Both chains are coiled around the same axis, have the same pitch of 34 angstroms; the pair of chains has a radius of 10 angstroms. According to another study, when measured in a different solution, the DNA chain measured 22 to 26 angstroms wide, one nucleotide unit measured 3.3 Å long. Although each individual nucleotide is small, a DNA polymer can be large and contain hundreds of millions, such as in chromosome 1. Chromosome 1 is the largest human chromosome with 220 million base pairs, would be 85 mm long if straightened. DNA does not exist as a single strand, but instead as a pair of strands that are held together; these two long strands coil in the shape of a double helix. The nucleotide contains both a segment of the backbone of a nucleobase. A nucleobase linked to a sugar is called a nucleoside, a base linked to a sugar and to one or more phosphate groups is called a nucleotide.
A biopolymer comprising multiple linked nucleotides is called a polynucleotide. The backbone of the DNA strand is made from alternating sugar residues; the sugar in DNA is 2-deoxyribose, a pentose sugar. The sugars are joined together by phosphate groups that form phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings; these are known as the 3′-end, 5′-end carbons, the prime symbol being used to distinguish these carbon atoms from those of the base to which the deoxyribose forms a glycosidic bond. When imagining DNA, each phosphoryl is considered to "belong" to the nucleotide whose 5′ carbon forms a bond therewith. Any DNA strand therefore has one end at which there is a phosphoryl attached to the 5′ carbon of a ribose and another end a
The N-terminus is the start of a protein or polypeptide referring to the free amine group located at the end of a polypeptide. The amine group is bonded to another carboxylic group in a protein to make it a chain, but since the end of a protein has only 1 out of 2 areas chained, the free amine group is referred to the N-terminus. By convention, peptide sequences are written N-terminus to C-terminus, left to right in LTR languages; this correlates the translation direction to the text direction. Each amino acid has a carboxylic group. Amino acids link to one another by peptide bonds which form through a dehydration reaction that joins the carboxyl group of one amino acid to the amine group of the next in a head-to-tail manner to form a polypeptide chain; the chain has two ends – an amine group, the N-terminus, an unbound carboxyl group, the C-terminus. When a protein is translated from messenger RNA, it is created from N-terminus to C-terminus; the amino end of an amino acid during the elongation stage of translation, attaches to the carboxyl end of the growing chain.
Since the start codon of the genetic code codes for the amino acid methionine, most protein sequences start with a methionine. However, some proteins are modified posttranslationally, for example, by cleavage from a protein precursor, therefore may have different amino acids at their N-terminus; the N-terminus is the first part of the protein. It contains signal peptide sequences, "intracellular postal codes" that direct delivery of the protein to the proper organelle; the signal peptide is removed at the destination by a signal peptidase. The N-terminal amino acid of a protein is an important determinant of its half-life; this is called the N-end rule. The N-terminal signal peptide is recognized by the signal recognition particle and results in the targeting of the protein to the secretory pathway. In eukaryotic cells, these proteins are synthesized at the rough endoplasmic reticulum. In prokaryotic cells, the proteins are exported across the cell membrane. In chloroplasts, signal peptides target proteins to the thylakoids.
The N-terminal mitochondrial targeting peptide allows the protein to be imported into the mitochondrion. The N-terminal chloroplast targeting peptide allows for the protein to be imported into the chloroplast. Protein N-termini can be modified co - or posttranslationally. Modifications include the removal of initiator methionine by aminopeptidases, attachment of small chemical groups such as acetyl and methyl, the addition of membrane anchors, such as palmitoyl and myristoyl groups N-terminal acetylationN-terminal acetylation is a form of protein modification that can occur in both prokaryotes and eukaryotes, it has been suggested that N-terminal acetylation can prevent a protein from following a secretory pathway. The N-terminus can be modified by the addition of a myristoyl anchor. Proteins that are modified this way contain a consensus motif at their N-terminus as a modification signal; the N-terminus can be modified by the addition of a fatty acid anchor to form N-acylated proteins. The most common form of such modification is the addition of a palmitoyl group.
TopFIND, a scientific database covering proteases, their cleavage site specificity, substrates and protein termini originating from their activity