1.
Image histogram
–
An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value, by looking at the histogram for a specific image a viewer will be able to judge the entire tonal distribution at a glance. Image histograms are present on modern digital cameras. Photographers can use them as an aid to show the distribution of tones captured and this is less useful when using a raw image format, as the dynamic range of the displayed image may only be an approximation to that in the raw file. The horizontal axis of the graph represents the tonal variations, while the vertical axis represents the number of pixels in that particular tone. The left side of the horizontal axis represents the black and dark areas, the middle represents medium grey, the vertical axis represents the size of the area that is captured in each one of these zones. Thus, the histogram for a dark image will have the majority of its data points on the left side. Conversely, the histogram for a bright image with few dark areas and/or shadows will have most of its data points on the right side. Image editors typically have provisions to create a histogram of the image being edited, the histogram plots the number of pixels in the image with a particular brightness value. Algorithms in the editor allow the user to visually adjust the brightness value of each pixel. Improvements in picture brightness and contrast can thus be obtained, in the field of computer vision, image histograms can be useful tools for thresholding. Because the information contained in the graph is a representation of pixel distribution as a function of tonal variation and this threshold value can then be used for edge detection, image segmentation, and co-occurrence matrices. com
2.
Color histogram
–
In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of ranges, that span the images color space. The color histogram can be built for any kind of color space, for monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by a number of measurements. Each measurement has its own range of the light spectrum. If the set of color values is sufficiently small, each of those colors may be placed on a range by itself. Most often, the space is divided into a number of ranges, often arranged as a regular grid. The color histogram may also be represented and displayed as a function defined over the color space that approximates the pixel counts. Like other kinds of histograms, the histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of colors values. Color histograms are flexible constructs that can be built from images in color spaces, whether RGB. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, a two-dimensional histogram of Red-Blue chromaticity divided into four bins might yield a histogram that looks like this table, A histogram can be N-dimensional. The histogram provides a summarization of the distribution of data in an image. The color histogram of an image is relatively invariant with translation and rotation about the viewing axis, importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels. A histogram is a representation of the number of pixels in an image. In a more simple way to explain, a histogram is a bar graph, whose X-axis represents the tonal scale, and Y-axis represents the number of pixels in an image in a certain area of the tonal scale. For example, the graph of a luminance histogram shows the number of pixels for each level, and when there are more pixels. A color histogram of an image represents the distribution of the composition of colors in the image and it shows different types of colors appeared and the number of pixels in each type of the colors appeared. Note that a color histogram focuses only on the proportion of the number of different types of colors, the values of a color histogram are from statistics
3.
Karl Pearson
–
Karl Pearson FRS was an influential English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics, Pearson was also a protégé and biographer of Sir Francis Galton. Pearson was born in Islington, London to William and Fanny and he then travelled to Germany to study physics at the University of Heidelberg under G H Quincke and metaphysics under Kuno Fischer. He next visited the University of Berlin, where he attended the lectures of the famous physiologist Emil du Bois-Reymond on Darwinism, Pearson also studied Roman Law, taught by Bruns and Mommsen, medieval and 16th century German Literature, and Socialism. He became an historian and Germanist and spent much of the 1880s in Berlin, Heidelberg, Vienna, Saig bei Lenzkirch. He wrote on Passion plays, religion, Goethe, Werther, as well as sex-related themes, Pearson was offered a Germanics post at Kings College, Cambridge. Comparing Cambridge students to those he knew from Germany, Karl found German students inathletic and he wrote his mother, I used to think athletics and sport was overestimated at Cambridge, but now I think it cannot be too highly valued. Have you ever attempted to conceive all there is in the world worth knowing—that not one subject in the universe is unworthy of study, mankind seems on the verge of a new and glorious discovery. What Newton did to simplify the planetary motions must now be done to unite in one whole the various isolated theories of mathematical physics, Pearson then returned to London to study law, emulating his father. His next career move was to the Inner Temple, where he read law until 1881, after this, he returned to mathematics, deputising for the mathematics professor at Kings College, London in 1881 and for the professor at University College, London in 1883. In 1884, he was appointed to the Goldsmid Chair of Applied Mathematics and Mechanics at University College, Pearson became the editor of Common Sense of the Exact Sciences when William Kingdon Clifford died. The collaboration, in biometry and evolutionary theory, was a fruitful one, Weldon introduced Pearson to Charles Darwins cousin Francis Galton, who was interested in aspects of evolution such as heredity and eugenics. Pearson became Galtons protégé—his statistical heir as some have put it—at times to the verge of hero worship, in 1890 Pearson married Maria Sharpe. Maria died in 1928 and in 1929 Karl married Margaret Victoria Child and he and his family lived at 7 Well Road in Hampstead, now marked with a blue plaque. He predicted that Galton, rather than Charles Darwin, would be remembered as the most prodigious grandson of Erasmus Darwin, when Galton died, he left the residue of his estate to the University of London for a Chair in Eugenics. Pearson was the first holder of this chair — the Galton Chair of Eugenics and he formed the Department of Applied Statistics, into which he incorporated the Biometric and Galton laboratories. He remained with the department until his retirement in 1933, Pearson was a zealous atheist and a freethinker. This book covered several themes that were later to become part of the theories of Einstein, Pearson asserted that the laws of nature are relative to the perceptive ability of the observer
4.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table
5.
Bar graph
–
A bar chart or bar graph is a chart or graph that presents grouped data with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally, a vertical bar chart is sometimes called a Line graph. A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories, one axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one, diagrams of the velocity of a constantly accelerating object against time published in The Latitude of Forms about 300 years before can be interpreted as proto bar charts. Bar charts have a discrete range, bar charts are usually scaled so that all the data can fit on the chart. Bars on the chart may be arranged in any order, bar charts arranged from highest to lowest incidence are called Pareto charts. Normally, bars showing frequency will be arranged in chronological sequence, bar graphs/charts provide a visual presentation of categorical data. Categorical data is a grouping of data into groups, such as months of the year, age group, shoe sizes. In a column bar chart, the appear along the horizontal axis. Bar graphs can also be used for more complex comparisons of data with grouped bar charts, in a grouped bar chart, for each categorical group there are two or more bars. These bars are color-coded to represent a particular grouping, alternatively, a stacked bar chart could be used. The stacked bar chart stacks bars that represent different groups on top of each other, the height of the resulting bar shows the combined result of the groups. However, stacked bar charts are not suited to datasets where some groups have negative values, in such cases, grouped bar chart are preferable. Grouped bar graphs present the information in the same order in each grouping. Stacked bar graphs present the information in the sequence on each bar. See Extension, EasyTimeline to include bar charts in Wikipedia, enhanced Metafile Format to use in office suits, as MS PowerPoint. Histogram, similar appearance - for continuous data http, //www. wikihow. com/Make-Bar-Graphs must see how to do a bargraph be happy, ) Directory of graph software, free online graph creation tool at the website for the National Center for Education Statistics
6.
Frequency (statistics)
–
In statistics the frequency of an event i is the number n i of times the event occurred in an experiment or study. These frequencies are often represented in histograms. The cumulative frequency is the total of the frequencies of all events at or below a certain point in an ordered list of events. The relative frequency of an event is the absolute frequency normalized by the number of events. The values of f i for all events i can be plotted to produce a frequency distribution, in the case when n i =0 for certain i, pseudocounts can be added. The height of a rectangle is equal to the frequency density of the interval. The total area of the histogram is equal to the number of data, a histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases fall into each of several categories. The categories are usually specified as consecutive, non-overlapping intervals of a variable, the categories must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the variable is continuous. A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent, the bars can be plotted vertically or horizontally. A vertical bar chart is called a column bar chart. A frequency distribution table is an arrangement of the values one or more variables take in a sample. This interpretation is often contrasted with Bayesian probability, in fact, the term frequentist was first used by M. G. Kendall in 1949, to contrast with Bayesians, whom he called non-frequentists. He observed 3. we may distinguish two main attitudes. It might be thought that the differences between the frequentists and the non-frequentists are largely due to the differences of the domains which they purport to cover, I assert that this is not so. Aperiodic frequency Cumulative frequency analysis Law of large numbers Probability density function Statistical regularity Word frequency
7.
Density estimation
–
In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is thought of as the density according to which a population is distributed. A variety of approaches to density estimation are used, including Parzen windows, the most basic form of density estimation is a rescaled histogram. We will consider records of the incidence of diabetes, the data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases. We used the 532 complete records, the conditional density estimates are then used to construct the probability of diabetes conditional on glu. The glu data were obtained from the MASS package of the R programming language, within R. Pima. tr and. Pima. te give a fuller account of the data. The mean of glu in the cases is 143.1. The mean of glu in the cases is 110.0. From this we see that, in data set, diabetes cases are associated with greater levels of glu. This will be clearer by plots of the estimated density functions. The first figure shows density estimates of p, p, and p, the density estimates are kernel density estimates using a Gaussian kernel. That is, a Gaussian density function is placed at each point. From the density of glu conditional on diabetes, we can obtain the probability of diabetes conditional on glu via Bayes rule, for brevity, diabetes is abbreviated db. in this formula. P = p p p p + p p The second figure shows the posterior probability p. From these data, it appears that a level of glu is associated with diabetes. The following R commands will create the figures shown above and these commands can be entered at the command prompt by using cut and paste. Note that the conditional density estimator uses bandwidths that are optimal for unconditional densities. The following R commands use the function to deliver optimal smoothing
8.
Probability density function
–
In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. The probability density function is everywhere, and its integral over the entire space is equal to one. The terms probability distribution function and probability function have also sometimes used to denote the probability density function. However, this use is not standard among probabilists and statisticians, further confusion of terminology exists because density function has also been used for what is here called the probability mass function. In general though, the PMF is used in the context of random variables. Suppose a species of bacteria typically lives 4 to 6 hours, what is the probability that a bacterium lives exactly 5 hours. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000, instead we might ask, What is the probability that the bacterium dies between 5 hours and 5.01 hours. Lets say the answer is 0.02, next, What is the probability that the bacterium dies between 5 hours and 5.001 hours. The answer is probably around 0.002, since this is 1/10th of the previous interval, the probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. In these three examples, the ratio / is approximately constant, and equal to 2 per hour, for example, there is 0.02 probability of dying in the 0. 01-hour interval between 5 and 5.01 hours, and =2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours, therefore, in response to the question What is the probability that the bacterium dies at 5 hours. A literally correct but unhelpful answer is 0, but an answer can be written as dt. This is the probability that the bacterium dies within a window of time around 5 hours. For example, the probability that it lives longer than 5 hours, there is a probability density function f with f =2 hour−1. The integral of f over any window of time is the probability that the dies in that window. A probability density function is most commonly associated with absolutely continuous univariate distributions, a random variable X has density fX, where fX is a non-negative Lebesgue-integrable function, if, Pr = ∫ a b f X d x. That is, f is any function with the property that. In the continuous univariate case above, the measure is the Lebesgue measure
9.
Relative frequency
–
In statistics the frequency of an event i is the number n i of times the event occurred in an experiment or study. These frequencies are often represented in histograms. The cumulative frequency is the total of the frequencies of all events at or below a certain point in an ordered list of events. The relative frequency of an event is the absolute frequency normalized by the number of events. The values of f i for all events i can be plotted to produce a frequency distribution, in the case when n i =0 for certain i, pseudocounts can be added. The height of a rectangle is equal to the frequency density of the interval. The total area of the histogram is equal to the number of data, a histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases fall into each of several categories. The categories are usually specified as consecutive, non-overlapping intervals of a variable, the categories must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the variable is continuous. A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent, the bars can be plotted vertically or horizontally. A vertical bar chart is called a column bar chart. A frequency distribution table is an arrangement of the values one or more variables take in a sample. This interpretation is often contrasted with Bayesian probability, in fact, the term frequentist was first used by M. G. Kendall in 1949, to contrast with Bayesians, whom he called non-frequentists. He observed 3. we may distinguish two main attitudes. It might be thought that the differences between the frequentists and the non-frequentists are largely due to the differences of the domains which they purport to cover, I assert that this is not so. Aperiodic frequency Cumulative frequency analysis Law of large numbers Probability density function Statistical regularity Word frequency
10.
Kernel density estimation
–
In statistics, kernel density estimation is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, let be an independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ, the choice of bandwidth is discussed in more detail below. A range of functions are commonly used, uniform, triangular, biweight, triweight, Epanechnikov, normal. The construction of a density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels are placed at each data point locations xi, similar methods are used to construct discrete Laplace operators on point clouds for manifold learning. Kernel density estimates are closely related to histograms, but can be endowed with such as smoothness or continuity by using a suitable kernel. To see this, we compare the construction of histogram and kernel density estimators, using these 6 data points, x1 = −2.1, x2 = −1.3, x3 = −0.4, x4 =1.9, x5 =5.1, x6 =6.2. For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data, in this case, we have 6 bins each of width 2. Whenever a data point falls inside this interval, we place a box of height 1/12, if more than one data point falls inside the same bin, we stack the boxes on top of each other. For the kernel density estimate, we place a normal kernel with variance 2.25 on each of the data points xi, the kernels are summed to make the kernel density estimate. The bandwidth of the kernel is a parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a random sample from the standard normal distribution. The grey curve is the true density, in comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h =0.05, which is too small. The green curve is oversmoothed since using the bandwidth h =2 obscures much of the underlying structure, the black curve with a bandwidth of h =0.337 is considered to be optimally smoothed since its density estimate is close to the true density. The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error, under weak assumptions on ƒ and K, MISE = AMISE + o where o is the little o notation. Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE = O and it can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the rate is slower than the typical n−1 convergence rate of parametric methods
11.
Kernel (statistics)
–
The term kernel has several distinct meanings in statistics. Note that such factors may well be functions of the parameters of the pdf or pmf and these factors form part of the normalization factor of the probability distribution, and are unnecessary in many situations. For example, in pseudo-random number sampling, most sampling algorithms ignore the normalization factor, in addition, in Bayesian analysis of conjugate prior distributions, the normalization factors are generally ignored during the calculations, and only the kernel considered. At the end, the form of the kernel is examined, and if it matches a known distribution, for many distributions, the kernel can be written in closed form, but not the normalization constant. An example is the normal distribution and this usage is particularly common in machine learning. In non-parametric statistics, a kernel is a function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables density functions, kernels are also used in time-series, in the use of the periodogram to estimate the spectral density where they are known as window functions. An additional use is in the estimation of an intensity for a point process where window functions are convolved with time-series data. Commonly, kernel widths must also be specified when running a non-parametric estimation, a kernel is a non-negative real-valued integrable function K satisfying the following two requirements, ∫ − ∞ + ∞ K d u =1, K = K for all values of u. The first requirement ensures that the method of kernel density estimation results in a probability density function, the second requirement ensures that the average of the corresponding distribution is equal to that of the sample used. If K is a kernel, then so is the function K* defined by K* = λK and this can be used to select a scale that is appropriate for the data. Several types of functions are commonly used, uniform, triangle, Epanechnikov, quartic, tricube, triweight, Gaussian, quadratic. In the table below,1 is the indicator function, Kernel density estimation Kernel smoother Stochastic kernel Density estimation Multivariate kernel density estimation Li, Qi, Racine, Jeffrey S. Nonparametric Econometrics, Theory and Practice. APPLIED SMOOTHING TECHNIQUES Part 1, Kernel Density Estimation, mean shift, A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence
12.
Smooth function
–
In mathematical analysis, the smoothness of a function is a property measured by the number of derivatives it has which are continuous. A smooth function is a function that has derivatives of all orders everywhere in its domain, differentiability class is a classification of functions according to the properties of their derivatives. Higher order differentiability classes correspond to the existence of more derivatives, consider an open set on the real line and a function f defined on that set with real values. Let k be a non-negative integer, the function f is said to be of class Ck if the derivatives f′, f′′. The function f is said to be of class C∞, or smooth, if it has derivatives of all orders. The function f is said to be of class Cω, or analytic, if f is smooth, Cω is thus strictly contained in C∞. Bump functions are examples of functions in C∞ but not in Cω, to put it differently, the class C0 consists of all continuous functions. The class C1 consists of all differentiable functions whose derivative is continuous, thus, a C1 function is exactly a function whose derivative exists and is of class C0. In particular, Ck is contained in Ck−1 for every k, C∞, the class of infinitely differentiable functions, is the intersection of the sets Ck as k varies over the non-negative integers. The function f = { x if x ≥0,0 if x <0 is continuous, because cos oscillates as x →0, f ’ is not continuous at zero. Therefore, this function is differentiable but not of class C1, the functions f = | x | k +1 where k is even, are continuous and k times differentiable at all x. But at x =0 they are not times differentiable, so they are of class Ck, the exponential function is analytic, so, of class Cω. The trigonometric functions are also analytic wherever they are defined, the function f is an example of a smooth function with compact support. Let n and m be some positive integers, if f is a function from an open subset of Rn with values in Rm, then f has component functions f1. Each of these may or may not have partial derivatives, the classes C∞ and Cω are defined as before. These criteria of differentiability can be applied to the functions of a differential structure. The resulting space is called a Ck manifold, if one wishes to start with a coordinate-independent definition of the class Ck, one may start by considering maps between Banach spaces. A map from one Banach space to another is differentiable at a point if there is a map which approximates it at that point
13.
Continuous data
–
In mathematics, the terms continuity, continuous, and continuum are used in a variety of related ways. Absolutely continuity of a measure with respect to another measure, sometimes this term is used to mean a probability distribution whose cumulative distribution function is continuous. Sometimes it has a less inclusive meaning, a distribution whose c. d. f. is absolutely continuous with respect to Lebesgue measure and this less inclusive sense is equivalent to the condition that every set whose Lebesgue measure is 0 has probability 0. Continuum, the line or the corresponding cardinal number. Linear continuum, any ordered set that shares certain properties of the real line, continuum, a nonempty compact connected metric space. Continuum hypothesis, a conjecture of Georg Cantor that there is no cardinal number between that of countablly infinite sets and the cardinality of the set of all real numbers, the latter cardinality is equal to the cardinality of the set of all subsets of a countably infinite set. Cardinality of the continuum, a number that represents the size of the set of real numbers
14.
Bar chart
–
A bar chart or bar graph is a chart or graph that presents grouped data with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally, a vertical bar chart is sometimes called a Line graph. A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories, one axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one, diagrams of the velocity of a constantly accelerating object against time published in The Latitude of Forms about 300 years before can be interpreted as proto bar charts. Bar charts have a discrete range, bar charts are usually scaled so that all the data can fit on the chart. Bars on the chart may be arranged in any order, bar charts arranged from highest to lowest incidence are called Pareto charts. Normally, bars showing frequency will be arranged in chronological sequence, bar graphs/charts provide a visual presentation of categorical data. Categorical data is a grouping of data into groups, such as months of the year, age group, shoe sizes. In a column bar chart, the appear along the horizontal axis. Bar graphs can also be used for more complex comparisons of data with grouped bar charts, in a grouped bar chart, for each categorical group there are two or more bars. These bars are color-coded to represent a particular grouping, alternatively, a stacked bar chart could be used. The stacked bar chart stacks bars that represent different groups on top of each other, the height of the resulting bar shows the combined result of the groups. However, stacked bar charts are not suited to datasets where some groups have negative values, in such cases, grouped bar chart are preferable. Grouped bar graphs present the information in the same order in each grouping. Stacked bar graphs present the information in the sequence on each bar. See Extension, EasyTimeline to include bar charts in Wikipedia, enhanced Metafile Format to use in office suits, as MS PowerPoint. Histogram, similar appearance - for continuous data http, //www. wikihow. com/Make-Bar-Graphs must see how to do a bargraph be happy, ) Directory of graph software, free online graph creation tool at the website for the National Center for Education Statistics
15.
Prunus serotina
–
Prunus serotina, commonly called black cherry, wild black cherry, rum cherry, or mountain black cherry, is a deciduous woody plant species belonging to the genus Prunus. The species is widespread and common in North America and South America, Black cherry is closely related to the chokecherry, chokecherry, however, is classified as a shrub or small tree and has smaller, less glossy leaves. Subspecies and varieties Prunus serotina var. alabamensis Little -southeastern United States Prunus serotina subsp, capuli McVaugh – central + southern Mexico, Central America, South America as far south as Argentina Prunus serotina subsp. Eximia McVaugh – Texas Prunus serotina subsp, hirsuta McVaugh – Georgia Prunus serotina var. rufula McVaugh – southwestern United States, northern + central Mexico Prunus serotina subsp. Serotina – Canada, United States, Mexico, Guatemala Prunus serotina var. serotina – Canada, United States, Mexico, virens McVaugh Black cherry is a medium-sized, fast-growing forest tree growing to a height of 50-80 feet. Leaves are 2 to 5 in length, ovate-lanceolate in shape, fall leaf color is yellow to red. Flowers are small, white and 5-petalled, in racemes 4 to 6 long which contain several dozen flowers, the flowers give rise to edible reddish-black berries. A mature black cherry tree can easily be identified in a forest by its broken, dark grey to black bark. However, for about the first decade or so of its life, the bark is thin, smooth, and banded, resembling that of a birch. It can also quickly be identified by its long, shiny leaves resembling those of a sourwood, Prunus serotina is a pioneer species. In the Midwest, it is growing mostly in old fields with other sunlight-loving species, such as black walnut, black locust. Gleason and Cronquist describe P. serotina as ormerly a forest tree, now abundant as a weed-tree of roadsides, waste land, and forest-margins. It is a moderately long-lived tree, with ages of up to 258 years known, though it is prone to damage, with branches breaking easily, any decay resulting, however. Seed production begins around 10 years of age, but does not become heavy until 30 years, germination rates are high, and the seeds are widely dispersed by birds who eat the fruit and then excrete them. Some seeds however may remain in the bank and not germinate for as long as three years. All Prunus species have seeds that benefit from scarification to germinate. P. serotina is also a host of caterpillars of various Lepidoptera, the eastern tent caterpillar defoliates entire groves some springs. Prunus serotina was widely introduced into Western and Central Europe as a tree in the mid 20th century
16.
Ancient Greek
–
Ancient Greek includes the forms of Greek used in ancient Greece and the ancient world from around the 9th century BC to the 6th century AD. It is often divided into the Archaic period, Classical period. It is antedated in the second millennium BC by Mycenaean Greek, the language of the Hellenistic phase is known as Koine. Koine is regarded as a historical stage of its own, although in its earliest form it closely resembled Attic Greek. Prior to the Koine period, Greek of the classic and earlier periods included several regional dialects, Ancient Greek was the language of Homer and of fifth-century Athenian historians, playwrights, and philosophers. It has contributed many words to English vocabulary and has been a subject of study in educational institutions of the Western world since the Renaissance. This article primarily contains information about the Epic and Classical phases of the language, Ancient Greek was a pluricentric language, divided into many dialects. The main dialect groups are Attic and Ionic, Aeolic, Arcadocypriot, some dialects are found in standardized literary forms used in literature, while others are attested only in inscriptions. There are also several historical forms, homeric Greek is a literary form of Archaic Greek used in the epic poems, the Iliad and Odyssey, and in later poems by other authors. Homeric Greek had significant differences in grammar and pronunciation from Classical Attic, the origins, early form and development of the Hellenic language family are not well understood because of a lack of contemporaneous evidence. Several theories exist about what Hellenic dialect groups may have existed between the divergence of early Greek-like speech from the common Proto-Indo-European language and the Classical period and they have the same general outline, but differ in some of the detail. The invasion would not be Dorian unless the invaders had some relationship to the historical Dorians. The invasion is known to have displaced population to the later Attic-Ionic regions, the Greeks of this period believed there were three major divisions of all Greek people—Dorians, Aeolians, and Ionians, each with their own defining and distinctive dialects. Often non-west is called East Greek, Arcadocypriot apparently descended more closely from the Mycenaean Greek of the Bronze Age. Boeotian had come under a strong Northwest Greek influence, and can in some respects be considered a transitional dialect, thessalian likewise had come under Northwest Greek influence, though to a lesser degree. Most of the dialect sub-groups listed above had further subdivisions, generally equivalent to a city-state and its surrounding territory, Doric notably had several intermediate divisions as well, into Island Doric, Southern Peloponnesus Doric, and Northern Peloponnesus Doric. The Lesbian dialect was Aeolic Greek and this dialect slowly replaced most of the older dialects, although Doric dialect has survived in the Tsakonian language, which is spoken in the region of modern Sparta. Doric has also passed down its aorist terminations into most verbs of Demotic Greek, by about the 6th century AD, the Koine had slowly metamorphosized into Medieval Greek
17.
Skewness
–
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined, the qualitative interpretation of the skew is complicated and unintuitive. Skew must not be thought to refer to the direction the curve appears to be leaning, in fact, conversely, positive skew indicates that the tail on the right side is longer or fatter than the left side. In cases where one tail is long but the tail is fat. Further, in multimodal distributions and discrete distributions, skewness is also difficult to interpret, importantly, the skewness does not determine the relationship of mean and median. In cases where it is necessary, data might be transformed to have a normal distribution, consider the two distributions in the figure just below. Within each graph, the values on the side of the distribution taper differently from the values on the left side. A left-skewed distribution usually appears as a right-leaning curve, positive skew, The right tail is longer, the mass of the distribution is concentrated on the left of the figure. A right-skewed distribution usually appears as a left-leaning curve, Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For instance, consider the sequence, whose values are evenly distributed around a central value of 50. If the distribution is symmetric, then the mean is equal to the median, if, in addition, the distribution is unimodal, then the mean = median = mode. This is the case of a coin toss or the series 1,2,3,4, note, however, that the converse is not true in general, i. e. zero skewness does not imply that the mean is equal to the median. Paul T. von Hippel points out, Many textbooks, teach a rule of thumb stating that the mean is right of the median under right skew and this rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long, most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they contradict the textbook interpretation of the median. It is sometimes referred to as Pearsons moment coefficient of skewness, or simply the moment coefficient of skewness, the last equality expresses skewness in terms of the ratio of the third cumulant κ3 to the 1. 5th power of the second cumulant κ2. This is analogous to the definition of kurtosis as the fourth cumulant normalized by the square of the second cumulant, the skewness is also sometimes denoted Skew. Starting from a standard cumulant expansion around a distribution, one can show that skewness =6 /standard deviation + O
18.
United States Census Bureau
–
The United States Census Bureau is a principal agency of the U. S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureaus primary mission is conducting the U. S. Census every ten years, in addition to the decennial census, the Census Bureau continually conducts dozens of other censuses and surveys, including the American Community Survey, the U. S. Economic Census, and the Current Population Survey, furthermore, economic and foreign trade indicators released by the federal government typically contain data produced by the Census Bureau. The Bureaus various censuses and surveys help allocate over $400 billion in federal funds every year and help states, local communities, the Census Bureau is part of the U. S. Department of Commerce and its director is appointed by the President of the United States. The Census Bureau now conducts a population count every 10 years in years ending with a 0. Between censuses, the Census Bureau makes population estimates and projections, the Census Bureau is mandated with fulfilling these obligations, the collecting of statistics about the nation, its people, and economy. The Census Bureaus legal authority is codified in Title 13 of the United States Code, the Census Bureau also conducts surveys on behalf of various federal government and local government agencies on topics such as employment, crime, health, consumer expenditures, and housing. Within the bureau, these are known as surveys and are conducted perpetually between and during decennial population counts. The Census Bureau also conducts surveys of manufacturing, retail, service. Between 1790 and 1840, the census was taken by marshals of the judicial districts, the Census Act of 1840 established a central office which became known as the Census Office. Several acts followed that revised and authorized new censuses, typically at the 10-year intervals, in 1902, the temporary Census Office was moved under the Department of Interior, and in 1903 it was renamed the Census Bureau under the new Department of Commerce and Labor. The department was intended to consolidate overlapping statistical agencies, but Census Bureau officials were hindered by their role in the department. An act in 1920 changed the date and authorized manufacturing censuses every 2 years, in 1929, a bill was passed mandating the House of Representatives be reapportioned based on the results of the 1930 Census. In 1954, various acts were codified into Title 13 of the US Code, by law, the Census Bureau must count everyone and submit state population totals to the U. S. President by December 31 of any year ending in a zero. States within the Union receive the results in the spring of the following year, the United States Census Bureau defines four statistical regions, with nine divisions. The Census Bureau regions are widely used. for data collection, the Census Bureau definition is pervasive. Title 13 of the U. S. Code establishes penalties for the disclosure of this information, all Census employees must sign an affidavit of non-disclosure prior to employment. The Bureau cannot share responses, addresses or personal information with anyone including United States or foreign government, only after 72 years does the information collected become available to other agencies or the general public
19.
Unit interval
–
In mathematics, the unit interval is the closed interval, that is, the set of all real numbers that are greater than or equal to 0 and less than or equal to 1. In addition to its role in analysis, the unit interval is used to study homotopy theory in the field of topology. In the literature, the unit interval is sometimes applied to the other shapes that an interval from 0 to 1 could take. However, the notation I is most commonly reserved for the closed interval, the unit interval is a complete metric space, homeomorphic to the extended real number line. As a topological space it is compact, contractible, path connected, the Hilbert cube is obtained by taking a topological product of countably many copies of the unit interval. In mathematical analysis, the interval is a one-dimensional analytical manifold whose boundary consists of the two points 0 and 1. Its standard orientation goes from 0 to 1, the unit interval is a totally ordered set and a complete lattice. The size or cardinality of a set is the number of elements it contains, the unit interval is a subset of the real numbers R. However, it has the same size as the whole set, the cardinality of the continuum. Moreover, it has the number of points as a square of area 1, as a cube of volume 1. The number of elements in all the sets is uncountable. The interval, with two, demarcated by the positive and negative units, occurs frequently, such as in the range of the trigonometric functions sine and cosine. This interval may be used for the domain of inverse functions, for instance, when θ is restricted to then sin is in this interval and arcsine is defined there. Sometimes, the unit interval is used to refer to objects that play a role in various branches of mathematics analogous to the role that plays in homotopy theory. For example, in the theory of quivers, the interval is the graph whose vertex set is. One can then define a notion of homotopy between quiver homomorphisms analogous to the notion of homotopy between continuous maps. In logic, the interval can be interpreted as a generalization of the Boolean domain, in which case rather than only taking values 0 or 1. Algebraically, negation is replaced with 1 − x, conjunction is replaced with multiplication, interpreting these values as logical truth values yields a multi-valued logic, which forms the basis for fuzzy logic and probabilistic logic. In these interpretations, a value is interpreted as the degree of truth – to what extent a proposition is true, or the probability that the proposition is true
20.
John Graunt
–
John Graunt was one of the first demographers, though by profession he was a haberdasher. Born in London, the eldest of seven or eight children of Henry and Mary Graunt and his father was a draper who had moved to London from Hampshire. In February 1641, Graunt married Mary Scott, with whom he had one son and he worked in his fathers shop until his father died in 1662, and became influential in the City. He was able to secure the post of professor of music for his friend William Petty in 1650 and he served in various ward offices in Cornhill ward, becoming a common councilman about 1669–71, warden of the Drapers Company in 1671 and a major in the trained band. Graunt, along with William Petty, developed early human statistical and he is credited with producing the first life table, giving probabilities of survival to each age. Graunt is also considered as one of the first experts in epidemiology, though the system was never truly created, Graunts work in studying the rolls resulted in the first statistically based estimation of the population of London. His work ran to five editions by 1676, the erudition of the Observations led Graunt to the Royal Society, where he presented his work and was subsequently elected a fellow in 1662 with the endorsement of the King. He was chosen as a member of the council in November 1664 and his house was destroyed in the Great Fire of London and he encountered other financial problems leading eventually to bankruptcy. He died of jaundice and liver disease at the age of 53, John Aubrey reported that he was a pleasant facetious companion and very hospitable and noted that his death was lamented by all good men that had the happinesse to knowe him. Tribute to Graunts pioneering work was paid by Sir Liam Donaldson on the anniversary of the Public Health Observatories. Graunt, John, Petty, William, Morris, Corbyn, J. P. Collection of Yearly Bills of Mortality, from 1657 to 1758 Inclusive
21.
Floor function
–
In mathematics and computer science, the floor and ceiling functions map a real number to the greatest preceding or the least succeeding integer, respectively. More precisely, floor = ⌊ x ⌋ is the greatest integer less than or equal to x, carl Friedrich Gauss introduced the square bracket notation for the floor function in his third proof of quadratic reciprocity. This remained the standard in mathematics until Kenneth E. Iverson introduced the names floor and ceiling, both notations are now used in mathematics, this article follows Iverson. e. The value of x rounded to an integer towards 0, the language APL uses ⌊x, other computer languages commonly use notations like entier, INT, or floor. In mathematics, it can also be written with boldface or double brackets, the ceiling function is usually denoted by ceil or ceiling in non-APL computer languages that have a notation for this function. The J Programming Language, a follow on to APL that is designed to use standard symbols, uses >. for ceiling. In mathematics, there is another notation with reversed boldface or double brackets ] ] x x[\. x[, the fractional part is the sawtooth function, denoted by for real x and defined by the formula = x − ⌊ x ⌋. HTML4.0 uses the names, &lfloor, &rfloor, &lceil. Unicode contains codepoints for these symbols at U+2308–U+230B, ⌈x⌉, ⌊x⌋, in the following formulas, x and y are real numbers, k, m, and n are integers, and Z is the set of integers. Floor and ceiling may be defined by the set equations ⌊ x ⌋ = max, ⌈ x ⌉ = min. Since there is exactly one integer in an interval of length one. Then ⌊ x ⌋ = m and ⌈ x ⌉ = n may also be taken as the definition of floor and these formulas can be used to simplify expressions involving floors and ceilings. In the language of order theory, the function is a residuated mapping. These formulas show how adding integers to the arguments affect the functions, negating the argument complements the fractional part, + = {0 if x ∈ Z1 if x ∉ Z. The floor, ceiling, and fractional part functions are idempotent, the result of nested floor or ceiling functions is the innermost function, ⌊ ⌈ x ⌉ ⌋ = ⌈ x ⌉, ⌈ ⌊ x ⌋ ⌉ = ⌊ x ⌋. If m and n are integers and n ≠0,0 ≤ ≤1 −1 | n |. If n is a positive integer ⌊ x + m n ⌋ = ⌊ ⌊ x ⌋ + m n ⌋, ⌈ x + m n ⌉ = ⌈ ⌈ x ⌉ + m n ⌉. For m =2 these imply n = ⌊ n 2 ⌋ + ⌈ n 2 ⌉
22.
Standard deviation
–
In statistics, the standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values. The standard deviation of a variable, statistical population, data set. It is algebraically simpler, though in practice less robust, than the absolute deviation. A useful property of the deviation is that, unlike the variance. There are also other measures of deviation from the norm, including mean absolute deviation, in addition to expressing the variability of a population, the standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the standard deviation in the results if the same poll were to be conducted multiple times. This derivation of a deviation is often called the standard error of the estimate or standard error of the mean when referring to a mean. It is computed as the deviation of all the means that would be computed from that population if an infinite number of samples were drawn. It is very important to note that the deviation of a population. The reported margin of error of a poll is computed from the error of the mean and is typically about twice the standard deviation—the half-width of a 95 percent confidence interval. The standard deviation is also important in finance, where the standard deviation on the rate of return on an investment is a measure of the volatility of the investment. For a finite set of numbers, the deviation is found by taking the square root of the average of the squared deviations of the values from their average value. For example, the marks of a class of eight students are the eight values,2,4,4,4,5,5,7,9. These eight data points have the mean of 5,2 +4 +4 +4 +5 +5 +7 +98 =5 and this formula is valid only if the eight values with which we began form the complete population. If the values instead were a sample drawn from some large parent population. In that case the result would be called the standard deviation. Dividing by n −1 rather than by n gives an estimate of the variance of the larger parent population. This is known as Bessels correction, as a slightly more complicated real-life example, the average height for adult men in the United States is about 70 inches, with a standard deviation of around 3 inches
23.
Interquartile range
–
In other words, the IQR is the 1st quartile subtracted from the 3rd quartile, these quartiles can be clearly seen on a box plot on the data. It is an estimator, defined as the 25% trimmed range. The interquartile range is a measure of variability, based on dividing a set into quartiles. Quartiles divide a rank-ordered data set into four equal parts, the values that separate parts are called the first, second, and third quartiles, and they are denoted by Q1, Q2, and Q3, respectively. Unlike total range, the range has a breakdown point of 25%. The IQR is used to box plots, simple graphical representations of a probability distribution. For a symmetric distribution, half the IQR equals the median absolute deviation, the median is the corresponding measure of central tendency. The IQR can be used to identify outliers, the quartile deviation or semi-interquartile range is defined as half the IQR. If P is normally distributed, then the score of the first quartile, z1, is -0.67. However, a distribution can be trivially perturbed to maintain its Q1 and Q2 std. scores at 0.67 and -0.67. A better test of normality, such as Q-Q plot would be indicated here, the interquartile range is often used to find outliers in data. Outliers here are defined as observations that fall below Q1 −1.5 IQR or above Q3 +1.5 IQR, in a boxplot, the highest and lowest occurring value within this limit are indicated by whiskers of the box and any outliers as individual points. Midhinge Interdecile range Robust measures of scale
24.
Pareto chart
–
The left vertical axis is the frequency of occurrence, but it can alternatively represent cost or another important unit of measure. The right vertical axis is the percentage of the total number of occurrences, total cost. Because the reasons are in decreasing order, the function is a concave function. To take the example below, in order to lower the amount of late arrivals by 78%, the purpose of the Pareto chart is to highlight the most important among a set of factors. In quality control, it represents the most common sources of defects, the highest occurring type of defect, or the most frequent reasons for customer complaints. Wilkinson devised an algorithm for producing statistically based acceptance limits for each bar in the Pareto chart, the Pareto chart is one of the seven basic tools of quality control. Control chart Histogram Pareto analysis Quality control Seven Basic Tools of Quality Statistical process control 7 QC Tools Hart, K. M. & Hart, santosh, Pre Press Juran, J. M. Quality control handbook. Juran, J. M. & Gryna, F. M. Quality planning, design and analysis of experiments, 3rd ed. What every engineer should know about quality control, wilkinson, L. Revising the Pareto Chart
25.
International Standard Book Number
–
The International Standard Book Number is a unique numeric commercial book identifier. An ISBN is assigned to each edition and variation of a book, for example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, the method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country. The initial ISBN configuration of recognition was generated in 1967 based upon the 9-digit Standard Book Numbering created in 1966, the 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108. Occasionally, a book may appear without a printed ISBN if it is printed privately or the author does not follow the usual ISBN procedure, however, this can be rectified later. Another identifier, the International Standard Serial Number, identifies periodical publications such as magazines, the ISBN configuration of recognition was generated in 1967 in the United Kingdom by David Whitaker and in 1968 in the US by Emery Koltay. The 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108, the United Kingdom continued to use the 9-digit SBN code until 1974. The ISO on-line facility only refers back to 1978, an SBN may be converted to an ISBN by prefixing the digit 0. For example, the edition of Mr. J. G. Reeder Returns, published by Hodder in 1965, has SBN340013818 -340 indicating the publisher,01381 their serial number. This can be converted to ISBN 0-340-01381-8, the check digit does not need to be re-calculated, since 1 January 2007, ISBNs have contained 13 digits, a format that is compatible with Bookland European Article Number EAN-13s. An ISBN is assigned to each edition and variation of a book, for example, an ebook, a paperback, and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, a 13-digit ISBN can be separated into its parts, and when this is done it is customary to separate the parts with hyphens or spaces. Separating the parts of a 10-digit ISBN is also done with either hyphens or spaces, figuring out how to correctly separate a given ISBN number is complicated, because most of the parts do not use a fixed number of digits. ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for country or territory regardless of the publication language. Some ISBN registration agencies are based in national libraries or within ministries of culture, in other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded. In Canada, ISBNs are issued at no cost with the purpose of encouraging Canadian culture. In the United Kingdom, United States, and some countries, where the service is provided by non-government-funded organisations. Australia, ISBNs are issued by the library services agency Thorpe-Bowker
26.
Milwaukee
–
Milwaukee is the largest city in the state of Wisconsin and the fifth-largest city in the Midwestern United States. The county seat of Milwaukee County, it is on Lake Michigans western shore, Milwaukees estimated population in 2015 was 600,155. Milwaukee is the cultural and economic center of the Milwaukee–Racine–Waukesha Metropolitan Area with an estimated population of 2,046,692 as of 2015. Ranked by estimated 2014 population, Milwaukee is the 31st largest city in the United States, the first Europeans to pass through the area were French Catholic missionaries and fur traders. In 1818, the French Canadian explorer Solomon Juneau settled in the area, large numbers of German immigrants helped increase the citys population during the 1840s, with Poles and other immigrants arriving in the following decades. Known for its traditions, Milwaukee is currently experiencing its largest construction boom since the 1960s. In addition, many new skyscrapers, condos, lofts and apartments have been built in neighborhoods on and near the lakefront, the word Milwaukee may come from the Potawatomi language minwaking, or Ojibwe language ominowakiing, Gathering place. The first recorded inhabitants of the Milwaukee area are the Menominee, Fox, Mascouten, Sauk, Potawatomi, Ojibwe, many of these people had lived around Green Bay before migrating to the Milwaukee area around the time of European contact. In the second half of the 18th century, the Indians at Milwaukee played a role in all the wars on the American continent. During the French and Indian War, a group of Ojibwas, in the American Revolutionary War, the Indians around Milwaukee were some of the few Indians who remained loyal to the American cause throughout the Revolution. After American independence, the Indians fought the United States in the Northwest Indian War as part of the Council of Three Fires, during the War of 1812, Indians held a council in Milwaukee in June 1812, which resulted in their decision to attack Chicago. This resulted in the Battle of Fort Dearborn on August 15,1812, the War of 1812 did not end well for the Indians, and after the Black Hawk War in 1832, the Indians in Milwaukee signed their final treaty with the United States in Chicago in 1833. This paved the way for American settlement, Europeans had arrived in the Milwaukee area prior to the 1833 Treaty of Chicago. French missionaries and traders first passed through the area in the late 17th and 18th centuries, alexis Laframboise, in 1785, coming from Michilimackinac settled a trading post, therefore, he is the first European descent resident of the Milwaukee region. Early explorers called the Milwaukee River and surrounding lands various names, Melleorki, Milwacky, Mahn-a-waukie, Milwarck, for many years, printed records gave the name as Milwaukie. One story of Milwaukees name says, ne day during the thirties of the last century a newspaper calmly changed the name to Milwaukee, the spelling Milwaukie lives on in Milwaukie, Oregon, named after the Wisconsin city in 1847, before the current spelling was universally accepted. Milwaukee has three founding fathers, Solomon Juneau, Byron Kilbourn, and George H. Walker, Solomon Juneau was the first of the three to come to the area, in 1818. He was not the first European settler but founded a town called Juneaus Side, or Juneautown, in competition with Juneau, Byron Kilbourn established Kilbourntown west of the Milwaukee River and made sure the streets running toward the river did not join with those on the east side
27.
OCLC
–
The Online Computer Library Center is a US-based nonprofit cooperative organization dedicated to the public purposes of furthering access to the worlds information and reducing information costs. It was founded in 1967 as the Ohio College Library Center, OCLC and its member libraries cooperatively produce and maintain WorldCat, the largest online public access catalog in the world. OCLC is funded mainly by the fees that libraries have to pay for its services, the group first met on July 5,1967 on the campus of the Ohio State University to sign the articles of incorporation for the nonprofit organization. The group hired Frederick G. Kilgour, a former Yale University medical school librarian, Kilgour wished to merge the latest information storage and retrieval system of the time, the computer, with the oldest, the library. The goal of network and database was to bring libraries together to cooperatively keep track of the worlds information in order to best serve researchers and scholars. The first library to do online cataloging through OCLC was the Alden Library at Ohio University on August 26,1971 and this was the first occurrence of online cataloging by any library worldwide. Membership in OCLC is based on use of services and contribution of data, between 1967 and 1977, OCLC membership was limited to institutions in Ohio, but in 1978, a new governance structure was established that allowed institutions from other states to join. In 2002, the structure was again modified to accommodate participation from outside the United States. As OCLC expanded services in the United States outside of Ohio, it relied on establishing strategic partnerships with networks, organizations that provided training, support, by 2008, there were 15 independent United States regional service providers. OCLC networks played a key role in OCLC governance, with networks electing delegates to serve on OCLC Members Council, in early 2009, OCLC negotiated new contracts with the former networks and opened a centralized support center. OCLC provides bibliographic, abstract and full-text information to anyone, OCLC and its member libraries cooperatively produce and maintain WorldCat—the OCLC Online Union Catalog, the largest online public access catalog in the world. WorldCat has holding records from public and private libraries worldwide. org, in October 2005, the OCLC technical staff began a wiki project, WikiD, allowing readers to add commentary and structured-field information associated with any WorldCat record. The Online Computer Library Center acquired the trademark and copyrights associated with the Dewey Decimal Classification System when it bought Forest Press in 1988, a browser for books with their Dewey Decimal Classifications was available until July 2013, it was replaced by the Classify Service. S. The reference management service QuestionPoint provides libraries with tools to communicate with users and this around-the-clock reference service is provided by a cooperative of participating global libraries. OCLC has produced cards for members since 1971 with its shared online catalog. OCLC commercially sells software, e. g. CONTENTdm for managing digital collections, OCLC has been conducting research for the library community for more than 30 years. In accordance with its mission, OCLC makes its research outcomes known through various publications and these publications, including journal articles, reports, newsletters, and presentations, are available through the organizations website. The most recent publications are displayed first, and all archived resources, membership Reports – A number of significant reports on topics ranging from virtual reference in libraries to perceptions about library funding
28.
JSTOR
–
JSTOR is a digital library founded in 1995. Originally containing digitized back issues of journals, it now also includes books and primary sources. It provides full-text searches of almost 2,000 journals, more than 8,000 institutions in more than 160 countries have access to JSTOR, most access is by subscription, but some older public domain content is freely available to anyone. William G. Bowen, president of Princeton University from 1972 to 1988, JSTOR originally was conceived as a solution to one of the problems faced by libraries, especially research and university libraries, due to the increasing number of academic journals in existence. Most libraries found it prohibitively expensive in terms of cost and space to maintain a collection of journals. By digitizing many journal titles, JSTOR allowed libraries to outsource the storage of journals with the confidence that they would remain available long-term, online access and full-text search ability improved access dramatically. Bowen initially considered using CD-ROMs for distribution, JSTOR was initiated in 1995 at seven different library sites, and originally encompassed ten economics and history journals. JSTOR access improved based on feedback from its sites. Special software was put in place to make pictures and graphs clear, with the success of this limited project, Bowen and Kevin Guthrie, then-president of JSTOR, wanted to expand the number of participating journals. They met with representatives of the Royal Society of London and an agreement was made to digitize the Philosophical Transactions of the Royal Society dating from its beginning in 1665, the work of adding these volumes to JSTOR was completed by December 2000. The Andrew W. Mellon Foundation funded JSTOR initially, until January 2009 JSTOR operated as an independent, self-sustaining nonprofit organization with offices in New York City and in Ann Arbor, Michigan. JSTOR content is provided by more than 900 publishers, the database contains more than 1,900 journal titles, in more than 50 disciplines. Each object is identified by an integer value, starting at 1. In addition to the site, the JSTOR labs group operates an open service that allows access to the contents of the archives for the purposes of corpus analysis at its Data for Research service. This site offers a facility with graphical indication of the article coverage. Users may create focused sets of articles and then request a dataset containing word and n-gram frequencies and they are notified when the dataset is ready and may download it in either XML or CSV formats. The service does not offer full-text, although academics may request that from JSTOR, JSTOR Plant Science is available in addition to the main site. The materials on JSTOR Plant Science are contributed through the Global Plants Initiative and are only to JSTOR
29.
PubMed Identifier
–
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez system of information retrieval, from 1971 to 1997, MEDLINE online access to the MEDLARS Online computerized database primarily had been through institutional facilities, such as university libraries. PubMed, first released in January 1996, ushered in the era of private, free, home-, the PubMed system was offered free to the public in June 1997, when MEDLINE searches via the Web were demonstrated, in a ceremony, by Vice President Al Gore. Information about the journals indexed in MEDLINE, and available through PubMed, is found in the NLM Catalog. As of 5 January 2017, PubMed has more than 26.8 million records going back to 1966, selectively to the year 1865, and very selectively to 1809, about 500,000 new records are added each year. As of the date,13.1 million of PubMeds records are listed with their abstracts. In 2016, NLM changed the system so that publishers will be able to directly correct typos. Simple searches on PubMed can be carried out by entering key aspects of a subject into PubMeds search window, when a journal article is indexed, numerous article parameters are extracted and stored as structured information. Such parameters are, Article Type, Secondary identifiers, Language, publication type parameter enables many special features. As these clinical girish can generate small sets of robust studies with considerable precision, since July 2005, the MEDLINE article indexing process extracts important identifiers from the article abstract and puts those in a field called Secondary Identifier. The secondary identifier field is to store numbers to various databases of molecular sequence data, gene expression or chemical compounds. For clinical trials, PubMed extracts trial IDs for the two largest trial registries, ClinicalTrials. gov and the International Standard Randomized Controlled Trial Number Register, a reference which is judged particularly relevant can be marked and related articles can be identified. If relevant, several studies can be selected and related articles to all of them can be generated using the Find related data option, the related articles are then listed in order of relatedness. To create these lists of related articles, PubMed compares words from the title and abstract of each citation, as well as the MeSH headings assigned, using a powerful word-weighted algorithm. The related articles function has been judged to be so precise that some researchers suggest it can be used instead of a full search, a strong feature of PubMed is its ability to automatically link to MeSH terms and subheadings. Examples would be, bad breath links to halitosis, heart attack to myocardial infarction, where appropriate, these MeSH terms are automatically expanded, that is, include more specific terms. Terms like nursing are automatically linked to Nursing or Nursing and this important feature makes PubMed searches automatically more sensitive and avoids false-negative hits by compensating for the diversity of medical terminology. The My NCBI area can be accessed from any computer with web-access, an earlier version of My NCBI was called PubMed Cubby
30.
Ishikawa diagram
–
Ishikawa diagrams are causal diagrams created by Kaoru Ishikawa that show the causes of a specific event. Common uses of the Ishikawa diagram are product design and quality defect prevention to identify potential factors causing an overall effect, each cause or reason for imperfection is a source of variation. Causes are usually grouped into categories to identify these sources of variation. Required to accomplish the job Materials, Raw materials, parts, pens, paper, the basic concept was first used in the 1920s, and is considered one of the seven basic tools of quality control. It is known as a fishbone diagram because of its shape, mazda Motors famously used an Ishikawa diagram in the development of the Miata sports car, where the required result was Jinba Ittai. The main causes included such aspects as touch and braking with the causes including highly granular factors such as 50/50 weight distribution. Every factor identified in the diagram was included in the final design, causes in the diagram are often categorized, such as to the 5 Ms, described below. Cause-and-effect diagrams can reveal key relationships among variables, and the possible causes provide additional insight into process behavior. Causes can be derived from brainstorming sessions and these groups can then be labeled as categories of the fishbone. They will typically be one of the categories mentioned above. Causes can be traced back to root causes with the 5 Whys technique, however, this is not globally recognized. It has been suggested to return to the roots of the tools and to keep the teaching simple while recognizing the original intent, milieu/Mother Nature Management/Money Power Maintenance Milieu is also used as the 6th M by industries for investigations taking the environment into account. Product/Service Price Place Promotion People/personnel Process Physical Evidence Packaging The 8 Ps are primarily used in product marketing, managing Quality 5th ed, ISBN 978-1-4051-4279-3 OCLC288977828
31.
Check sheet
–
The check sheet is a form used to collect data in real time at the location where the data is generated. The data it captures can be quantitative or qualitative, when the information is quantitative, the check sheet is sometimes called a tally sheet. The check sheet is one of the so-called Seven Basic Tools of Quality Control, the defining characteristic of a check sheet is that data are recorded by making marks on it. A typical check sheet is divided into regions, and marks made in different regions have different significance, data are read by observing the location and number of marks on the sheet. However, a sheet can be used to construct the frequency distribution as the process is being observed. When the process distribution is ready to be assessed, the assessor fills out the check sheets heading, each time the process generates an output, he or she measures the output, determines the bin in which the measurement falls, and adds to that bins check marks. When the observation period has concluded, the assessor should examine it as follows, is there more than one peak. Do the check marks fall completely within the limits with room to spare. Or are there a significant number of marks that fall outside the specification limits. When a process has identified as a candidate for improvement, its important to know what types of defects occur in its outputs. This information serves as a guide for investigating and removing the sources of defects, additionally, rules for recording the presence of defects of different types when observed for the same process output must be set down. When the process distribution is ready to be assessed, the assessor fills out the check sheets heading, if no defects are found for a process output, no check mark is made. When the observation period has concluded, the assessor should generate a Pareto chart from the resulting data and this chart then determines the order in which the process is to be investigated and sources of variation that lead to defects removed. Each time the process generates an output, he or she assesses the output for defects, if no defects are found for a process output, no check mark is made. When the observation period has concluded, the assessor should reexamine each check sheet, using his or her knowledge of the process in conjunction with the locations should reveal the source or sources of variation that produce the defects. When a process has identified as a candidate for improvement. Note that the categories and how process outputs are to be placed into these categories must be agreed to. Additionally, rules for recording the presence of defects of different types when observed for the process output must be set down
32.
Control chart
–
Control charts, also known as Shewhart charts or process-behavior charts, are a statistical process control tool used to determine if a manufacturing or business process is in a state of control. If analysis of the chart indicates that the process is currently under control. In addition, data from the process can be used to predict the performance of the process. If the chart indicates that the process is not in control, analysis of the chart can help determine the sources of variation. The control chart is one of the seven basic tools of quality control, typically control charts are used for time-series data, though they can be used for data that have logical comparability, however the type of chart used to do this requires consideration. The control chart was invented by Walter A. Shewhart while working for Bell Labs in the 1920s, the companys engineers had been seeking to improve the reliability of their telephony transmission systems. Because amplifiers and other equipment had to be buried underground, there was a business need to reduce the frequency of failures. By 1920, the engineers had already realized the importance of reducing variation in a manufacturing process, moreover, they had realized that continual process-adjustment in reaction to non-conformance actually increased variation and degraded quality. Shewharts boss, George Edwards, recalled, Dr. Shewhart prepared a little memorandum only about a page in length, about a third of that page was given over to a simple diagram which we would all recognize today as a schematic control chart. That diagram, and the text which preceded and followed it set forth all of the essential principles. Shewhart created the basis for the chart and the concept of a state of statistical control by carefully designed experiments. While Shewhart drew from pure mathematical statistical theories, he understood that data from physical processes typically produce a normal distribution curve and he discovered that observed variation in manufacturing data did not always behave the same way as data in nature. In 1924 or 1925, Shewharts innovation came to the attention of W. Edwards Deming, Deming later worked at the United States Department of Agriculture and became the mathematical advisor to the United States Census Bureau. Over the next half a century, Deming became the foremost champion, after the defeat of Japan at the close of World War II, Deming served as statistical consultant to the Supreme Commander for the Allied Powers. Any point outside of the control limits 2, a Run of 7 Points all above or All below the central line - Stop the production > Quarantine and 100% check > Adjust Process. > Check 5 Consecutive samples > Continue The Process, a Run of 7 Point Up or Down - Instruction as above If the process is in control,99. 7300% of all the points will fall between the control limits. Any observations outside the limits, or systematic patterns within, suggest the introduction of a new source of variation, since increased variation means increased quality costs, a control chart signaling the presence of a special-cause requires immediate investigation. This makes the control limits very important decision aids, the control limits provide information about the process behavior and have no intrinsic relationship to any specification targets or engineering tolerance
33.
Scatter diagram
–
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one variable can be displayed. A scatter plot can be used either one continuous variable that is under the control of the experimenter. The measured or dependent variable is plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis, a scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. For example, weight and height, weight would be on y axis, correlations may be positive, negative, or null. If the pattern of dots slopes from lower left to upper right, if the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of best fit can be drawn in order to study the relationship between the variables, an equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a solution for arbitrary relationships. A scatter plot is very useful when we wish to see how two comparable data sets agree with each other. In this case, an identity line, i. e. a y=x line, one of the most powerful aspects of a scatter plot, however, is its ability to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a line such as LOESS. Furthermore, if the data are represented by a model of simple relationships. The scatter diagram is one of the seven basic tools of quality control, scatter charts can be built in the form of bubble, marker, or/and line charts. The researcher would then plot the data in a plot, assigning lung capacity to the horizontal axis. A person with a capacity of 400 cl who held his/her breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point in the Cartesian coordinates. For a set of data variables X1, X2, xk, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format