1.
Chromosome
–
A chromosome is a DNA molecule with part or all of the genetic material of an organism. Prokaryotes usually have one single circular chromosome, whereas most eukaryotes are diploid, chromosomes in eukaryotes are composed of chromatin fiber. Chromatin fiber is made of nucleosomes, a nucleosome is a histone octamer with part of a longer DNA strand attached to and wrapped around it. Chromatin fiber, together with associated proteins is known as chromatin, chromatin is present in most cells, with a few exceptions, for example, red blood cells. Occurring only in the nucleus of cells, chromatin contains the vast majority of DNA, except for a small amount inherited maternally. Chromosomes are normally visible under a microscope only when the cell is undergoing the metaphase of cell division. Before this happens every chromosome is copied once, and the copy is joined to the original by a centromere resulting in an X-shaped structure, the original chromosome and the copy are now called sister chromatids. During metaphase, when a chromosome is in its most condensed state, in this highly condensed form chromosomes are easiest to distinguish and study. In prokaryotic cells, chromatin occurs free-floating in cytoplasm, as these cells lack organelles, the main information-carrying macromolecule is a single piece of coiled double-helix DNA, containing many genes, regulatory elements and other noncoding DNA. The DNA-bound macromolecules are proteins that serve to package the DNA, chromosomes vary widely between different organisms. Some species such as certain bacteria also contain plasmids or other extrachromosomal DNA and these are circular structures in the cytoplasm that contain cellular DNA and play a role in horizontal gene transfer. Chromosomal recombination during meiosis and subsequent sexual reproduction plays a significant role in genetic diversity. In prokaryotes and viruses, the DNA is often densely packed and organized, in the case of archaea, by homologs to eukaryotic histones, small circular genomes called plasmids are often found in bacteria and also in mitochondria and chloroplasts, reflecting their bacterial origins. Some use the term chromosome in a sense, to refer to the individualized portions of chromatin in cells. However, others use the concept in a sense, to refer to the individualized portions of chromatin during cell division. The word chromosome comes from the Greek χρῶμα and σῶμα, describing their strong staining by particular dyes, schleiden, Virchow and Bütschli were among the first scientists who recognized the structures now so familiar to everyone as chromosomes. The term was coined by von Waldeyer-Hartz, referring to the term chromatin, in a series of experiments beginning in the mid-1880s, Theodor Boveri gave the definitive demonstration that chromosomes are the vectors of heredity. His two principles were the continuity of chromosomes and the individuality of chromosomes and it is the second of these principles that was so original
2.
Homology (biology)
–
In biology, homology is the existence of shared ancestry between a pair of structures, or genes, in different taxa. Evolutionary biology explains homologous structures adapted to different purposes as the result of descent with modification from a common ancestor, examples include the legs of a centipede, the maxillary palp and labial palp of an insect, and the spinous processes of successive vertebrae in a vertebral column. Sequence homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either an event or a duplication event. Homology among proteins or DNA is inferred from their sequence similarity, significant similarity is strong evidence that two sequences are related by divergent evolution from a common ancestor. Alignments of multiple sequences are used to discover the homologous regions, the word homology, coined in about 1656, derives from the Greek ὁμόλογος homologos from ὁμός homos same and λόγος logos relation. Homology is the relationship between biological structures or sequences that are derived from a common ancestor, for example, many insects possess two pairs of flying wings. In beetles, the first pair of wings has evolved into a pair of hard wing covers, the same major forearm bones are found in fossils of lobe-finned fish such as Eusthenopteron. The opposite of homologous organs are analogous organs which do similar jobs in two taxa that were not present in their last common ancestor but rather evolved separately. For example, the wings of insects and birds evolved independently in widely separated groups, similarly, the wings of a sycamore maple seed and the wings of a bird are analogous but not homologous, as they develop from quite different structures. A structure can be homologous at one level, but only analogous at another, for example, in the pterosaurs, the wing involves both the forelimb and the hindlimb. Analogy is called homoplasy in cladistics, and convergent or parallel evolution in evolutionary biology, specialised terms are used in taxonomic research. Primary homology is that initially conjectured by a researcher based on similar structure or anatomical connections, secondary homology is implied by parsimony analysis, where a character that only occurs once on a tree is taken to be homologous. As implied in this definition, many cladists consider homology to be synonymous with synapomorphy, homologies provide the fundamental basis for all biological classification, although some may be highly counter-intuitive. The homologies between these have been discovered by comparing genes in evolutionary developmental biology, among insects, the stinger of the female honey bee is a modified ovipositor, homologous with ovipositors in other insects such as the Orthoptera, Hemiptera, and those Hymenoptera without stingers. The three small bones in the ear of mammals including humans, the malleus, incus. The malleus and incus develop in the embryo from structures that form jaw bones in lizards, both lines of evidence show that these bones are homologous, sharing a common ancestor. Among the many homologies in mammal reproductive systems, ovaries and testicles are homologous, in many plants, defensive or storage structures are made by modifications of the development of primary leaves, stems, and roots
3.
Gene expression
–
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are proteins, but in non-protein coding genes such as transfer RNA or small nuclear RNA genes. The process of gene expression is used by all known life—eukaryotes, prokaryotes, several steps in the gene expression process may be modulated, including the transcription, RNA splicing, translation, and post-translational modification of a protein. Gene regulation gives the cell control over structure and function, and is the basis for differentiation, morphogenesis. In genetics, gene expression is the most fundamental level at which the genotype gives rise to the phenotype, the genetic code stored in DNA is interpreted by gene expression, and the properties of the expression give rise to the organisms phenotype. Such phenotypes are expressed by the synthesis of proteins that control the organisms shape. Regulation of gene expression is critical to an organisms development. A gene is a stretch of DNA that encodes information, genomic DNA consists of two antiparallel and reverse complementary strands, each having 5 and 3 ends. This RNA is complementary to the template 3 →5 DNA strand, therefore, the resulting 5 →3 RNA strand is identical to the coding DNA strand with the exception that thymines are replaced with uracils in the RNA. A coding DNA strand reading ATG is indirectly transcribed through the non-coding strand as AUG in RNA, in prokaryotes, transcription is carried out by a single type of RNA polymerase, which needs a DNA sequence called a Pribnow box as well as a sigma factor to start transcription. RNA polymerase I is responsible for transcription of ribosomal RNA genes, RNA polymerase II transcribes all protein-coding genes but also some non-coding RNAs. Pol II includes a C-terminal domain that is rich in serine residues, when these residues are phosphorylated, the CTD binds to various protein factors that promote transcript maturation and modification. RNA polymerase III transcribes 5S rRNA, transfer RNA genes, transcription ends when the polymerase encounters a sequence called the terminator. These include 5 capping, which is set of reactions that add 7-methylguanosine to the 5 end of pre-mRNA. The m7G cap is then bound by cap binding complex heterodimer, another modification is 3 cleavage and polyadenylation. They occur if polyadenylation signal sequence is present in pre-mRNA, which is usually between protein-coding sequence and terminator, the pre-mRNA is first cleaved and then a series of ~200 adenines are added to form poly tail, which protects the RNA from degradation. Poly tail is bound by multiple poly-binding proteins necessary for mRNA export, a very important modification of eukaryotic pre-mRNA is RNA splicing. The majority of eukaryotic pre-mRNAs consist of alternating segments called exons and introns, in certain cases, some introns or exons can be either removed or retained in mature mRNA
4.
Enzyme
–
Enzymes /ˈɛnzaɪmz/ are macromolecular biological catalysts. Enzymes accelerate, or catalyze, chemical reactions, the molecules at the beginning of the process upon which enzymes may act are called substrates and the enzyme converts these into different molecules, called products. Almost all metabolic processes in the cell need enzymes in order to occur at rates fast enough to sustain life, the set of enzymes made in a cell determines which metabolic pathways occur in that cell. The study of enzymes is called enzymology, enzymes are known to catalyze more than 5,000 biochemical reaction types. Most enzymes are proteins, although a few are catalytic RNA molecules, enzymes specificity comes from their unique three-dimensional structures. Like all catalysts, enzymes increase the rate of a reaction by lowering its activation energy, some enzymes can make their conversion of substrate to product occur many millions of times faster. An extreme example is orotidine 5-phosphate decarboxylase, which allows a reaction that would take millions of years to occur in milliseconds. Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules, inhibitors are molecules that decrease enzyme activity, many drugs and poisons are enzyme inhibitors. An enzymes activity decreases markedly outside its optimal temperature and pH, some enzymes are used commercially, for example, in the synthesis of antibiotics. French chemist Anselme Payen was the first to discover an enzyme, diastase and he wrote that alcoholic fermentation is an act correlated with the life and organization of the yeast cells, not with the death or putrefaction of the cells. In 1877, German physiologist Wilhelm Kühne first used the term enzyme, the word enzyme was used later to refer to nonliving substances such as pepsin, and the word ferment was used to refer to chemical activity produced by living organisms. Eduard Buchner submitted his first paper on the study of yeast extracts in 1897, in a series of experiments at the University of Berlin, he found that sugar was fermented by yeast extracts even when there were no living yeast cells in the mixture. He named the enzyme that brought about the fermentation of sucrose zymase, in 1907, he received the Nobel Prize in Chemistry for his discovery of cell-free fermentation. Following Buchners example, enzymes are usually named according to the reaction they carry out, the biochemical identity of enzymes was still unknown in the early 1900s. Sumner showed that the enzyme urease was a protein and crystallized it. These three scientists were awarded the 1946 Nobel Prize in Chemistry, the discovery that enzymes could be crystallized eventually allowed their structures to be solved by x-ray crystallography. This high-resolution structure of lysozyme marked the beginning of the field of structural biology, an enzymes name is often derived from its substrate or the chemical reaction it catalyzes, with the word ending in -ase
5.
Gene
–
A gene is a locus of DNA which is made up of nucleotides and is the molecular unit of heredity. The transmission of genes to an offspring is the basis of the inheritance of phenotypic traits. These genes make up different DNA sequences called genotypes, genotypes along with environmental and developmental factors determine what the phenotypes will be. Most biological traits are under the influence of polygenes as well as gene–environment interactions, genes can acquire mutations in their sequence, leading to different variants, known as alleles, in the population. These alleles encode slightly different versions of a protein, which cause different phenotypical traits, usage of the term having a gene typically refers to containing a different allele of the same, shared gene. Genes evolve due to natural selection or survival of the fittest of the alleles, the concept of a gene continues to be refined as new phenomena are discovered. For example, regulatory regions of a gene can be far removed from its coding regions, some viruses store their genome in RNA instead of DNA and some gene products are functional non-coding RNAs. The existence of discrete inheritable units was first suggested by Gregor Mendel, from 1857 to 1864, in Brno, he studied inheritance patterns in 8000 common edible pea plants, tracking distinct traits from parent to offspring. He described these mathematically as 2n combinations where n is the number of differing characteristics in the original peas, although he did not use the term gene, he explained his results in terms of discrete inherited units that give rise to observable physical characteristics. This description prefigured the distinction between genotype and phenotype, charles Darwin developed a theory of inheritance he termed pangenesis, from Greek pan and genesis / genos. Darwin used the term gemmule to describe hypothetical particles that would mix during reproduction, de Vries called these units pangenes, after Darwins 1868 pangenesis theory. In 1909 the Danish botanist Wilhelm Johannsen shortened the name to gene, advances in understanding genes and inheritance continued throughout the 20th century. Deoxyribonucleic acid was shown to be the repository of genetic information by experiments in the 1940s to 1950s. In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities, indivisible by recombination, collectively, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, which is transcribed from DNA. This dogma has since shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics, in 1972, Walter Fiers and his team at the University of Ghent were the first to determine the sequence of a gene, the gene for Bacteriophage MS2 coat protein. The subsequent development of chain-termination DNA sequencing in 1977 by Frederick Sanger improved the efficiency of sequencing, an automated version of the Sanger method was used in early phases of the Human Genome Project. The theories developed in the 1930s and 1940s to integrate molecular genetics with Darwinian evolution are called the evolutionary synthesis
6.
UCSC Genome Browser
–
The UCSC Genome Browser is an on-line genome browser hosted by the University of California, Santa Cruz. The Genome Browser Database, browsing tools, downloadable data files, today the browser is used by geneticists, molecular biologists and physicians as well as students and teachers of evolution for access to genomic information. High coverage is necessary to allow overlap to guide the construction of contiguous regions. The species hosted with full-featured genome browsers are shown in the table, the large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of bioinformatics. The basic paradigm of display is to show the sequence in the horizontal dimension. Blocks of color along the coordinate axis show the locations of the alignments of the data types. The ability to show this large variety of types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data. To find a specific gene or genomic region, the user may type in the name, an accession number for an RNA. Presenting the data in the format allows the browser to present link access to detailed information about any of the annotations. Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed, by pre-aligning the 55 million RNAs of GenBank to each of the 81 genome assemblies, the browser allows instant access to the alignments of any RNA to any of the hosted species. The juxtaposition of the types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of an image for publication in academic journals. One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the chromosome with full annotation tracks. Researchers can display a single gene, an exon, or an entire chromosome band, showing dozens or hundreds of genes. A convenient drag-and-zoom feature allows the user to any region in the genome image. Researchers may also use the browser to display their own data via the Custom Tracks tool and this feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with the Table Browser tool, any browser view created by a user, including those containing Custom Tracks, may be shared with other users via the Saved Sessions tool
7.
Wikidata
–
Wikidata is a collaboratively edited knowledge base operated by the Wikimedia Foundation. It is intended to provide a source of data which can be used by Wikimedia projects such as Wikipedia. This is similar to the way Wikimedia Commons provides storage for files and access to those files for all Wikimedia projects. Wikidata is powered by the software Wikibase, Wikidata is a document-oriented database, focused on items. Each item represents a topic and is identified by a number, prefixed with the letter Q—for example. This enables the basic information required to identify the topic the item covers to be translated without favouring any language, information is added to items by creating statements. Statements take the form of pairs, with each statement consisting of a property. The creation of the project was funded by donations from the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, at this time, only the first phase was available. Historically, a Wikipedia article would include a list of links, being links to articles on the same topic in other editions of Wikipedia. Initially, Wikidata was a repository of interlanguage links. No Wikipedia language editions were able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links, on 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. This functionality was extended to the Hebrew and Italian Wikipedias on 30 January, to the English Wikipedia on 13 February, on 23 September 2013, phase 1 went live on Wikimedia Commons. The first aspects of the second phase were deployed on 4 February 2013, the values were initially limited to two data types, with more data types to follow later. The first new type, string, was deployed on 6 March, the ability of the various language editions of Wikipedia to access data added to Wikidata as part of phase two was rolled out progressively between 27 March and 25 April 2013. On 16 September 2015, Wikidata began allowing so-called arbitrary access, for example, in the past the article about Berlin you could not access data about Germany, but with arbitrary access it could. On 27 April 2016 arbitrary access was activated on Wikimedia Commons, phase 3 will involve database querying and the creation of lists based on data stored on Wikidata. As of October 2016 two tools for querying Wikidata were available, AutoList and PetScan, additionally to a public SPARQL endpoint, there is concern that the project is being influenced by lobbying companies, PR professionals and search engine optimizers. As of December 2015, according to Wikimedia statistics, half of the information in Wikidata is unsourced, another 30% is labeled as having come from Wikipedia, but with no indication as to which article
8.
Chromosome 4 (human)
–
Chromosome 4 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome, chromosome 4 spans more than 186 million base pairs and represents between 6 and 6.5 percent of the total DNA in cells. Identifying genes on each chromosome is an area of genetic research. Because researchers use different approaches to genome annotation their predictions of the number of genes on each chromosome varies, in January 2017, two estimates differed by 12%, with one estimate giving 2,441 genes, and the other estimate giving 2,164 genes. The chromosome is ~191 megabases in length, in a 2012 paper, seven hundred and fifty seven protein encoding genes were identified on this chromosome. Two-hundred and eleven of these coding sequences did not have any evidence at the protein level. Two-hundred and seventy one appear to be membrane proteins, fifty-four have been classified as cancer associated proteins. k. a
9.
Protein Data Bank
–
The Protein Data Bank is a crystallographic database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The PDB is overseen by a called the Worldwide Protein Data Bank. The PDB is a key resource in areas of structural biology, most major scientific journals, and some funding agencies, now require scientists to submit their structure data to the PDB. Many other databases use protein structures deposited in the PDB, for example, SCOP and CATH classify protein structures, while PDBsum provides a graphic overview of PDB entries using information from other sources, such as Gene ontology. By 1971, one of Meyers programs, SEARCH, enabled researchers to access information from the database to study protein structures offline. SEARCH was instrumental in enabling networking, thus marking the beginning of the PDB. Upon Hamiltons death in 1973, Tom Koeztle took over direction of the PDB for the subsequent 20 years, in January 1994, Joel Sussman of Israels Weizmann Institute of Science was appointed head of the PDB. In October 1998, the PDB was transferred to the Research Collaboratory for Structural Bioinformatics, the new director was Helen M. Berman of Rutgers University. In 2003, with the formation of the wwPDB, the PDB became an international organization, the founding members are PDBe, RCSB, and PDBj. Each of the four members of wwPDB can act as deposition, data processing, the data processing refers to the fact that wwPDB staff review and annotate each submitted entry. The data are automatically checked for plausibility. The PDB database is updated weekly, likewise, the PDB holdings list is also updated weekly. As of 14 March 2017, the breakdown of current holdings is as follows,103,514 structures in the PDB have a structure factor file,9,057 structures have an NMR restraint file. 2,826 structures in the PDB have a chemical shifts file, therefore, the final conformation of the protein is obtained, in the latter case, by solving a distance geometry problem. A few proteins are determined by cryo-electron microscopy, the significance of the structure factor files, mentioned above, is that, for PDB structures determined by X-ray diffraction that have a structure file, the electron density map may be viewed. The data of such structures is stored on the electron density server, however, since 2007, the rate of accumulation of new protein structures appears to have plateaued. The file format used by the PDB was called the PDB file format. This original format was restricted by the width of computer punch cards to 80 characters per line, around 1996, the macromolecular Crystallographic Information file format, mmCIF, which is an extension of the CIF format started to be phased in
10.
GeneCards
–
GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science and this database aims at providing a quick overview of the current available biomedical information about the searched gene, including the human genes, the encoded proteins, and the relevant diseases. The GeneCards database provides access to free Web resources about more than 7000 all known human genes that integrated from >90 data resources, such as HGNC, Ensembl, the core gene list is based on approved gene symbols published by the HUGO Gene Nomenclature Committee. The information are carefully gathered and selected from these databases by the powerful, over time, the GeneCards database has developed a suite of tools that has more specialised capability. Since 1998, the GeneCards database has been used by bioinformatics, genomics. Since the 1980s, sequence information has become abundant, many laboratories realized this. However, the information provided by the sequence databases focus on different aspect. To gather these scattered data, The Weizmann Institute of Science Crown Human Genome Centre developed a database called ‘GeneCards’ in 1997 and this database mainly deals with the human genome information, human genes, the encoded proteins’ functions, and the related diseases. At first, it includes two main features, the function to get integrated biomedical information about certain gene in ‘card’ format. Currently, the version 3 gather information from more than 90 database resources based on a consolidated gene list and it has developed a set of GeneCards suit, which are focus one more specific purposes. Nearly every 3 years life cycle, a new planning phase for subsequent revision will start, including implementation, development and semi-automated quality assurance, and deployment. Technologies used include Eclipse, Apache, Perl, XML, PHP, Propel, Java, R and MySQL. genecards. org/, annotation combinatory, Using GeneDecks, one can get a set of similar genes for a particular gene with a selected combinatorial annotation. The summary table result in ranking the different level of similarity between the genes and the probe gene. Annotation unification, Different data source often offer annotations with heterogeneous naming system, annotation unification of GeneDecks is based on the similarity in GeneCards gene-content space detection algorithms. Partner hunting, In GeneDecks’s Partner Hunter, users give a query gene, Set distillation, In Set distiller, users give a set of genes, and the system ranks attributes by their degree of sharing within a given gene set. GeneALaCart is a gene-set-orientated batch-querying engine based on the popular GeneCards database and it allows retrieval of information about multiple genes in a batch query. The GeneLoc suit member presents a human chromosome map, which is very important for designing a custom-made capture chip. GeneLoc includes further links to GeneCards, NCBIs Human Genome Sequencing, UniGene, firstly, enter what you want to search into the blank on the homepages
11.
Locus (genetics)
–
A locus in genetics is the position on a chromosome. Each chromosome carries many genes, humans estimated haploid protein coding genes are 19, 000-20,000, a variant of the similar DNA sequence located at a given locus is called an allele. The ordered list of known for a particular genome is called a gene map. Gene mapping is the process of determining the locus for a biological trait. The chromosomal locus of a gene might be written 3p22.1, here 3 means chromosome 3, p means p-arm. And 22 refers to region 2, band 2 and this is read as two two, not as twenty-two. So the entire locus is read as three P two two point one, the cytogenetic bands are counting from the centromere out toward the telomeres. A range of loci is specified in a similar way. For example, the locus of gene OCA1 may be written 11q1. 4-q2.1, meaning it is on the arm of chromosome 11. The ends of a chromosome are labeled pter and qter, a centisome is defined as 1% of a chromosome length. Chromosomal translocation Cytogenetic notation Karyotype Null allele Michael, R. Cummings, belmont, California, Brooks/Cole Overview at ornl. gov Chromosome Banding and Nomenclature from NCBI
12.
Base pair
–
A base pair is a unit consisting of two nucleobases bound to each other by hydrogen bonds. They form the blocks of the DNA double helix. Dictated by specific hydrogen bonding patterns, Watson-Crick base pairs allow the DNA helix to maintain a regular helical structure that is dependent on its nucleotide sequence. The complementary nature of this structure provides a backup copy of all genetic information encoded within double-stranded DNA. Many DNA-binding proteins can recognize specific base pairing patterns that identify particular regulatory regions of genes, intramolecular base pairs can occur within single-stranded nucleic acids. The size of a gene or an organisms entire genome is often measured in base pairs because DNA is usually double-stranded. Hence, the number of base pairs is equal to the number of nucleotides in one of the strands. The haploid human genome is estimated to be about 3.2 billion bases long and to contain 20, a kilobase is a unit of measurement in molecular biology equal to 1000 base pairs of DNA or RNA. The total amount of related DNA base pairs on Earth is estimated at 5.0 x 1037, in comparison, the total mass of the biosphere has been estimated to be as much as 4 TtC. Hydrogen bonding is the interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen donors and acceptors allows only the right pairs to form stably. Purine-pyrimidine base pairing of AT or GC or UA results in proper duplex structure, the only other purine-pyrimidine pairings would be AC and GT and UG, these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two bonds, does occur fairly often in RNA. Higher GC content results in higher melting temperatures, it is, therefore, on the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-transcribed genes — are comparatively GC-poor. GC content and melting temperature must also be taken into account when designing primers for PCR reactions, the following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5 end to the 3 end, thus and this is due to their isosteric chemistry. One common mutagenic base analog is 5-bromouracil, which resembles thymine, most intercalators are large polyaromatic compounds and are known or suspected carcinogens. Examples include ethidium bromide and acridine, an unnatural base pair is a designed subunit of DNA which is created in a laboratory and does not occur in nature