A locus in genetics is the position on a chromosome. Each chromosome carries many genes, humans estimated haploid protein coding genes are 19, 000-20,000, a variant of the similar DNA sequence located at a given locus is called an allele. The ordered list of known for a particular genome is called a gene map. Gene mapping is the process of determining the locus for a biological trait. The chromosomal locus of a gene might be written 3p22.1, here 3 means chromosome 3, p means p-arm. And 22 refers to region 2, band 2 and this is read as two two, not as twenty-two. So the entire locus is read as three P two two point one, the cytogenetic bands are counting from the centromere out toward the telomeres. A range of loci is specified in a similar way. For example, the locus of gene OCA1 may be written 11q1. 4-q2.1, meaning it is on the arm of chromosome 11. The ends of a chromosome are labeled pter and qter, a centisome is defined as 1% of a chromosome length. Chromosomal translocation Cytogenetic notation Karyotype Null allele Michael, R.
Cummings, California, Brooks/Cole Overview at ornl. gov Chromosome Banding and Nomenclature from NCBI
A gene is a locus of DNA which is made up of nucleotides and is the molecular unit of heredity. The transmission of genes to an offspring is the basis of the inheritance of phenotypic traits. These genes make up different DNA sequences called genotypes, genotypes along with environmental and developmental factors determine what the phenotypes will be. Most biological traits are under the influence of polygenes as well as gene–environment interactions, genes can acquire mutations in their sequence, leading to different variants, known as alleles, in the population. These alleles encode slightly different versions of a protein, which cause different phenotypical traits, usage of the term having a gene typically refers to containing a different allele of the same, shared gene. Genes evolve due to natural selection or survival of the fittest of the alleles, the concept of a gene continues to be refined as new phenomena are discovered. For example, regulatory regions of a gene can be far removed from its coding regions, some viruses store their genome in RNA instead of DNA and some gene products are functional non-coding RNAs.
The existence of discrete inheritable units was first suggested by Gregor Mendel, from 1857 to 1864, in Brno, he studied inheritance patterns in 8000 common edible pea plants, tracking distinct traits from parent to offspring. He described these mathematically as 2n combinations where n is the number of differing characteristics in the original peas, although he did not use the term gene, he explained his results in terms of discrete inherited units that give rise to observable physical characteristics. This description prefigured the distinction between genotype and phenotype, charles Darwin developed a theory of inheritance he termed pangenesis, from Greek pan and genesis / genos. Darwin used the term gemmule to describe hypothetical particles that would mix during reproduction, de Vries called these units pangenes, after Darwins 1868 pangenesis theory. In 1909 the Danish botanist Wilhelm Johannsen shortened the name to gene, advances in understanding genes and inheritance continued throughout the 20th century.
Deoxyribonucleic acid was shown to be the repository of genetic information by experiments in the 1940s to 1950s. In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities, indivisible by recombination, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, which is transcribed from DNA. This dogma has since shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics, in 1972, Walter Fiers and his team at the University of Ghent were the first to determine the sequence of a gene, the gene for Bacteriophage MS2 coat protein. The subsequent development of chain-termination DNA sequencing in 1977 by Frederick Sanger improved the efficiency of sequencing, an automated version of the Sanger method was used in early phases of the Human Genome Project. The theories developed in the 1930s and 1940s to integrate molecular genetics with Darwinian evolution are called the evolutionary synthesis
GeneCards is a database of human genes that provides genomic, transcriptomic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science and this database aims at providing a quick overview of the current available biomedical information about the searched gene, including the human genes, the encoded proteins, and the relevant diseases. The GeneCards database provides access to free Web resources about more than 7000 all known human genes that integrated from >90 data resources, such as HGNC, the core gene list is based on approved gene symbols published by the HUGO Gene Nomenclature Committee. The information are carefully gathered and selected from these databases by the powerful, over time, the GeneCards database has developed a suite of tools that has more specialised capability. Since 1998, the GeneCards database has been used by bioinformatics, genomics. Since the 1980s, sequence information has become abundant, many laboratories realized this.
However, the information provided by the sequence databases focus on different aspect. To gather these scattered data, The Weizmann Institute of Science Crown Human Genome Centre developed a database called ‘GeneCards’ in 1997 and this database mainly deals with the human genome information, human genes, the encoded proteins’ functions, and the related diseases. At first, it includes two main features, the function to get integrated biomedical information about certain gene in ‘card’ format. Currently, the version 3 gather information from more than 90 database resources based on a consolidated gene list and it has developed a set of GeneCards suit, which are focus one more specific purposes. Nearly every 3 years life cycle, a new planning phase for subsequent revision will start, including implementation and semi-automated quality assurance, and deployment. Technologies used include Eclipse, Perl, XML, PHP, Java, R and MySQL. genecards. org/, annotation combinatory, Using GeneDecks, one can get a set of similar genes for a particular gene with a selected combinatorial annotation.
The summary table result in ranking the different level of similarity between the genes and the probe gene. Annotation unification, Different data source often offer annotations with heterogeneous naming system, annotation unification of GeneDecks is based on the similarity in GeneCards gene-content space detection algorithms. Partner hunting, In GeneDecks’s Partner Hunter, users give a query gene, Set distillation, In Set distiller, users give a set of genes, and the system ranks attributes by their degree of sharing within a given gene set. GeneALaCart is a gene-set-orientated batch-querying engine based on the popular GeneCards database and it allows retrieval of information about multiple genes in a batch query. The GeneLoc suit member presents a human chromosome map, which is very important for designing a custom-made capture chip. GeneLoc includes further links to GeneCards, NCBIs Human Genome Sequencing, UniGene, enter what you want to search into the blank on the homepages
A chromosome is a DNA molecule with part or all of the genetic material of an organism. Prokaryotes usually have one single circular chromosome, whereas most eukaryotes are diploid, chromosomes in eukaryotes are composed of chromatin fiber. Chromatin fiber is made of nucleosomes, a nucleosome is a histone octamer with part of a longer DNA strand attached to and wrapped around it. Chromatin fiber, together with associated proteins is known as chromatin, chromatin is present in most cells, with a few exceptions, for example, red blood cells. Occurring only in the nucleus of cells, chromatin contains the vast majority of DNA, except for a small amount inherited maternally. Chromosomes are normally visible under a microscope only when the cell is undergoing the metaphase of cell division. Before this happens every chromosome is copied once, and the copy is joined to the original by a centromere resulting in an X-shaped structure, the original chromosome and the copy are now called sister chromatids.
During metaphase, when a chromosome is in its most condensed state, in this highly condensed form chromosomes are easiest to distinguish and study. In prokaryotic cells, chromatin occurs free-floating in cytoplasm, as these cells lack organelles, the main information-carrying macromolecule is a single piece of coiled double-helix DNA, containing many genes, regulatory elements and other noncoding DNA. The DNA-bound macromolecules are proteins that serve to package the DNA, chromosomes vary widely between different organisms. Some species such as certain bacteria contain plasmids or other extrachromosomal DNA and these are circular structures in the cytoplasm that contain cellular DNA and play a role in horizontal gene transfer. Chromosomal recombination during meiosis and subsequent sexual reproduction plays a significant role in genetic diversity. In prokaryotes and viruses, the DNA is often densely packed and organized, in the case of archaea, by homologs to eukaryotic histones, small circular genomes called plasmids are often found in bacteria and in mitochondria and chloroplasts, reflecting their bacterial origins.
Some use the term chromosome in a sense, to refer to the individualized portions of chromatin in cells. However, others use the concept in a sense, to refer to the individualized portions of chromatin during cell division. The word chromosome comes from the Greek χρῶμα and σῶμα, describing their strong staining by particular dyes, Virchow and Bütschli were among the first scientists who recognized the structures now so familiar to everyone as chromosomes. The term was coined by von Waldeyer-Hartz, referring to the term chromatin, in a series of experiments beginning in the mid-1880s, Theodor Boveri gave the definitive demonstration that chromosomes are the vectors of heredity. His two principles were the continuity of chromosomes and the individuality of chromosomes and it is the second of these principles that was so original
In biology, homology is the existence of shared ancestry between a pair of structures, or genes, in different taxa. Evolutionary biology explains homologous structures adapted to different purposes as the result of descent with modification from a common ancestor, examples include the legs of a centipede, the maxillary palp and labial palp of an insect, and the spinous processes of successive vertebrae in a vertebral column. Sequence homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either an event or a duplication event. Homology among proteins or DNA is inferred from their sequence similarity, significant similarity is strong evidence that two sequences are related by divergent evolution from a common ancestor. Alignments of multiple sequences are used to discover the homologous regions, the word homology, coined in about 1656, derives from the Greek ὁμόλογος homologos from ὁμός homos same and λόγος logos relation.
Homology is the relationship between biological structures or sequences that are derived from a common ancestor, for example, many insects possess two pairs of flying wings. In beetles, the first pair of wings has evolved into a pair of hard wing covers, the same major forearm bones are found in fossils of lobe-finned fish such as Eusthenopteron. The opposite of homologous organs are analogous organs which do similar jobs in two taxa that were not present in their last common ancestor but rather evolved separately. For example, the wings of insects and birds evolved independently in widely separated groups, the wings of a sycamore maple seed and the wings of a bird are analogous but not homologous, as they develop from quite different structures. A structure can be homologous at one level, but only analogous at another, for example, in the pterosaurs, the wing involves both the forelimb and the hindlimb. Analogy is called homoplasy in cladistics, and convergent or parallel evolution in evolutionary biology, specialised terms are used in taxonomic research.
Primary homology is that initially conjectured by a researcher based on similar structure or anatomical connections, secondary homology is implied by parsimony analysis, where a character that only occurs once on a tree is taken to be homologous. As implied in this definition, many cladists consider homology to be synonymous with synapomorphy, homologies provide the fundamental basis for all biological classification, although some may be highly counter-intuitive. The homologies between these have been discovered by comparing genes in evolutionary developmental biology, among insects, the stinger of the female honey bee is a modified ovipositor, homologous with ovipositors in other insects such as the Orthoptera and those Hymenoptera without stingers. The three small bones in the ear of mammals including humans, the malleus, incus. The malleus and incus develop in the embryo from structures that form jaw bones in lizards, both lines of evidence show that these bones are homologous, sharing a common ancestor.
Among the many homologies in mammal reproductive systems and testicles are homologous, in many plants, defensive or storage structures are made by modifications of the development of primary leaves and roots
In molecular biology, a transcription factor is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. In turn, this helps to regulate the expression of genes near that sequence, transcription factors work alone or with other proteins in a complex, by promoting, or blocking the recruitment of RNA polymerase to specific genes. A defining feature of transcription factors is that they contain at least one DNA-binding domain, transcription factors are usually classified into different families based on their DBDs. Transcription factors are essential for the regulation of expression and are, as a consequence. The number of factors found within an organism increases with genome size. Therefore, approximately 10% of genes in the code for transcription factors. Hence, the use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.
Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate, depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression and these mechanisms include, stabilize or block the binding of RNA polymerase to DNA catalyze the acetylation or deacetylation of histone proteins. The transcription factor can either do this directly or recruit other proteins with this catalytic activity and they bind to the DNA and help initiate a program of increased or decreased gene transcription. As such, they are vital for important cellular processes. Many of these GTFs do not actually bind DNA, but rather are part of the large transcription preinitiation complex that interacts with RNA polymerase directly, the most common GTFs are TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. The preinitiation complex binds to regions of DNA upstream to the gene that they regulate.
Other transcription factors regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the cell at the right time and in the right amount. Many transcription factors in multicellular organisms are involved in development, the Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans. Another example is the transcription factor encoded by the Sex-determining Region Y gene, cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, the estrogen receptor goes to the cells nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes
Proteins are large biomolecules, or macromolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, a linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide, short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides, or sometimes oligopeptides. The individual amino acid residues are bonded together by peptide bonds, the sequence of amino acid residues in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the code specifies 20 standard amino acids, however. Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors, proteins can work together to achieve a particular function, and they often associate to form stable protein complexes.
Once formed, proteins only exist for a period of time and are degraded and recycled by the cells machinery through the process of protein turnover. A proteins lifespan is measured in terms of its half-life and covers a wide range and they can exist for minutes or years with an average lifespan of 1–2 days in mammalian cells. Abnormal and or misfolded proteins are degraded more rapidly due to being targeted for destruction or due to being unstable. Like other biological macromolecules such as polysaccharides and nucleic acids, proteins are essential parts of organisms, many proteins are enzymes that catalyse biochemical reactions and are vital to metabolism. Proteins have structural or mechanical functions, such as actin and myosin in muscle and the proteins in the cytoskeleton, other proteins are important in cell signaling, immune responses, cell adhesion, and the cell cycle. In animals, proteins are needed in the diet to provide the essential amino acids that cannot be synthesized, digestion breaks the proteins down for use in the metabolism.
Methods commonly used to study structure and function include immunohistochemistry, site-directed mutagenesis, X-ray crystallography, nuclear magnetic resonance. Most proteins consist of linear polymers built from series of up to 20 different L-α-amino acids, all proteinogenic amino acids possess common structural features, including an α-carbon to which an amino group, a carboxyl group, and a variable side chain are bonded. Only proline differs from this structure as it contains an unusual ring to the N-end amine group. The amino acids in a chain are linked by peptide bonds. Once linked in the chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen. The peptide bond has two forms that contribute some double-bond character and inhibit rotation around its axis, so that the alpha carbons are roughly coplanar
FBJ murine osteosarcoma viral oncogene homolog B, known as FOSB or FosB, is a protein that, in humans, is encoded by the FOSB gene. The FOS gene family consists of 4 members, FOS, FOSB, FOSL1 and these genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, for example, ΔFosB overexpression triggers the development of addiction-related structural neuroplasticity throughout the reward system. ΔFosB or DeltaFosB is a splice variant of FosB. ΔFosB has been implicated as a factor in the development of virtually all forms of behavioral. In the brains reward system, it is linked to changes in a number of gene products, such as CREB. In the body, ΔFosB regulates the commitment of mesenchymal cells to the adipocyte or osteoblast lineage. In the nucleus accumbens, ΔFosB functions as a molecular switch. Addiction from chronic drug use involves alterations in gene expression in the mesocorticolimbic projection.
The most important transcription factors that produce these alterations are ΔFosB, cyclic adenosine monophosphate response element binding protein, ΔFosB overexpression has been implicated in addictions to alcohol, cocaine, nicotine, phencyclidine and substituted amphetamines, among others. ΔJunD, a factor, and G9a, a histone methyltransferase. Increases in nucleus accumbens ΔJunD expression using viral vectors can reduce or, with a large increase. ΔFosB plays an important role in regulating responses to natural rewards, such as palatable food, sex. Consequently, ΔFosB is the key mechanism involved in addictions to natural rewards as well, in particular, ΔFosB inhibitors may be an effective treatment for addiction and addictive disorders. ΔFosB levels have been found to increase upon the use of cocaine, each subsequent dose of cocaine continues to increase ΔFosB levels with no ceiling of tolerance. This change can be identified rather quickly, and may be sustained weeks after the last dose of the drug, transgenic mice exhibiting inducible expression of ΔFosB primarily in the nucleus accumbens and dorsal striatum exhibit sensitized behavioural responses to cocaine.
They self-administer cocaine at lower doses than control, but have a likelihood of relapse when the drug is withheld. ΔFosB increases the expression of AMPA receptor subunit GluR2 and decreases expression of dynorphin, viral overexpression of ΔFosB in the output neurons of the nigrostriatal dopamine pathway induces levodopa-induced dyskinesias in animal models of Parkinsons disease
In the fields of molecular biology and genetics, c-Fos is a proto-oncogene that is the human homolog of the retroviral oncogene v-fos. It was first discovered in rat fibroblasts as the gene of the FBJ MSV. It is a part of a bigger Fos family of transcription factors which includes c-Fos, FosB, Fra-1 and it has been mapped to chromosome region 14q21→q31. It plays an important role in cellular functions and has been found to be overexpressed in a variety of cancers. C-fos is a 380 amino acid protein with a leucine zipper region for dimerisation and DNA-binding. Unlike Jun proteins, it cannot form homodimers, only heterodimers with c-jun, in vitro studies have shown that Jun–Fos heterodimers are more stable and have stronger DNA-binding activity than Jun–Jun homodimers. A variety of stimuli, including serum, growth factors, tumor promoters, the c-fos mRNA and protein is generally among the first to be expressed and hence referred to as an immediate early gene. It is rapidly and transiently induced, within 15 minutes of stimulation and it can cause gene repression as well as gene activation, although different domains are believed to be involved in both processes.
It can induce a loss of polarity and epithelial-mesenchymal transition, leading to invasive. The AP-1 complex has been implicated in transformation and progression of cancer, in osteosarcoma and endometrial carcinoma, c-Fos overexpression was associated with high-grade lesions and poor prognosis. Also, in a comparison between precancerous lesion of the cervix uteri and invasive cervical cancer, c-Fos expression was lower in precancerous lesions. C-Fos has identified as independent predictor of decreased survival in breast cancer. Several studies have raised the idea that c-Fos may have tumor-suppressor activity, supporting this is the observation that in ovarian carcinomas, loss of c-Fos expression correlates with disease progression. This double action could be enabled by differential protein composition of cells and their environment, for example, dimerisation partners, co-activators. It is possible that the tumor suppressing activity is due to a proapoptotic function, fas ligand and the tumour necrosis factor-related apoptosis-inducing ligand might reflect an additional apoptotic mechanism induced by c-Fos, as observed in a human T-cell leukaemia cell line.
Another possible mechanism of c-Fos involvement in tumour suppression could be the direct regulation of BRCA1, methamphetamine and other psychoactive drugs have been shown to increase c-fos production in the mesocortical pathway as well as in the mesolimbic reward pathway. Accumbal c-Fos repression by ΔFosBs AP-1 complex acts as a switch for the long-term induction of ΔFosB. An increase in production in androgen receptor-containing neurons has been observed in rats after mating
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains an amount of information about the biological function of proteins derived from the research literature. The UniProt consortium comprises the European Bioinformatics Institute, the Swiss Institute of Bioinformatics, EBI, located at the Wellcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, maintains the ExPASy servers that are a resource for proteomics tools. In 2002, EBI, SIB, and PIR joined forces as the UniProt consortium, each consortium member is heavily involved in protein database maintenance and annotation. Until recently, EBI and SIB together produced the Swiss-Prot and TrEMBL databases and these databases coexisted with differing protein sequence coverage and annotation priorities. Swiss-Prot aimed to provide reliable protein sequences associated with a level of annotation.
Recognizing that sequence data were being generated at a pace exceeding Swiss-Prots ability to keep up, meanwhile, PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families. The consortium members pooled their resources and expertise, and launched UniProt in December 2003. UniProt provides four core databases, UniProtKB, UniParc, UniRef, UniProt Knowledgebase is a protein database partially curated by experts, consisting of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. As of 19 March 2014, release 2014_03 of UniProtKB/Swiss-Prot contains 542,782 sequence entries, UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings, the manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.
Sequences from the gene and the same species are merged into the same database entry. Differences between sequences are identified, and their cause documented, a range of sequence analysis tools is used in the annotation of UniProtKB/Swiss-Prot entries. Computer-predictions are manually evaluated, and relevant results selected for inclusion in the entry and these predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification, and protein family classification. Relevant publications are identified by searching databases such as PubMed, the full text of each paper is read, and information is extracted and added to the entry
Ensembl genome database project
Ensembl is one of several well known genome browsers for the retrieval of genomic information. Similar databases and browsers are found at NCBI and the University of California, the human genome consists of three billion base pairs, which code for approximately 20, 000–25,000 genes. However the genome alone is of use, unless the locations. One option is manual annotation, whereby a team of scientists tries to locate genes using experimental data from scientific journals, however this is a slow, painstaking task. The alternative, known as automated annotation, is to use the power of computers to do the complex pattern-matching of protein to DNA. In the Ensembl project, sequence data are fed into the gene annotation system which creates a set of predicted gene locations and saves them in a MySQL database for subsequent analysis, Ensembl makes these data freely accessible to the world research community. All the data and code produced by the Ensembl project is available to download, in addition, the Ensembl website provides computer-generated visual displays of much of the data.
Over time the project has expanded to additional species as well as a wider range of genomic data, including genetic variations. Central to the Ensembl concept is the ability to automatically generate graphical views of the alignment of genes and these are shown as data tracks, and individual tracks can be turned on and off, allowing the user to customise the display to suit their research interests. The interface enables the user to zoom in to a region or move along the genome in either direction, the graphics are complemented by tabular displays, and in many cases data can be exported directly from the page in a variety of standard file formats such as FASTA. Externally produced data can be added to the display, either via a DAS server on the internet, or by uploading a file in one of the supported formats, such as BAM, BED. Graphics are generated using a suite of custom Perl modules based on GD, in addition to its website, Ensembl provides a Perl API that models biological objects such as genes and proteins, allowing simple scripts to be written to retrieve data of interest.
The same API is used internally by the web interface to display the data and it is divided in sections like the core API, the compara API, the variation API, and the functional genomics API. The Ensembl website provides information on how to install and use the API. This software can be used to access the public MySQL database, the users could even choose to retrieve data from the MySQL with direct SQL queries, but this requires an extensive knowledge of the current database schema. Large datasets can be retrieved using the BioMart data-mining tool and it provides a web interface for downloading datasets using complex queries. Last, there is an FTP server which can be used to download entire MySQL databases as some selected data sets in other formats. The annotated genomes include most fully sequenced vertebrates and selected model organisms, all of them are eukaryotes, there are no prokaryotes