Econometrics
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. More it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference". An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships"; the first known use of the term "econometrics" was by Polish economist Paweł Ciompa in 1910. Jan Tinbergen is considered by many to be one of the founding fathers of econometrics. Ragnar Frisch is credited with coining the term in the sense. A basic tool for econometrics is the multiple linear regression model. Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods. Econometricians try to find estimators that have desirable statistical properties including unbiasedness and consistency.
Applied econometrics uses theoretical econometrics and real-world data for assessing economic theories, developing econometric models, analysing economic history, forecasting. A basic tool for econometrics is the multiple linear regression model. In modern econometrics, other statistical tools are used, but linear regression is still the most used starting point for an analysis. Estimating a linear regression on two variables can be visualised as fitting a line through data points representing paired values of the independent and dependent variables. For example, consider Okun's law, which relates GDP growth to the unemployment rate; this relationship is represented in a linear regression where the change in unemployment rate is a function of an intercept, a given value of GDP growth multiplied by a slope coefficient β 1 and an error term, ε: Δ Unemployment = β 0 + β 1 Growth + ε. The unknown parameters β β 1 can be estimated. Here β 1 is estimated to be −1.77 and β 0 is estimated to be 0.83.
This means that if GDP growth increased by one percentage point, the unemployment rate would be predicted to drop by 1.77 points. The model could be tested for statistical significance as to whether an increase in growth is associated with a decrease in the unemployment, as hypothesized. If the estimate of β 1 were not different from 0, the test would fail to find evidence that changes in the growth rate and unemployment rate were related; the variance in a prediction of the dependent variable as a function of the independent variable is given in polynomial least squares. Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods. Econometricians try to find estimators that have desirable statistical properties including unbiasedness and consistency. An estimator is unbiased. Ordinary least squares is used for estimation since it provides the BLUE or "best linear unbiased estimator" given the Gauss-Markov assumptions; when these assumptions are violated or other statistical properties are desired, other estimation techniques such as maximum likelihood estimation, generalized method of moments, or generalized least squares are used.
Estimators that incorporate prior beliefs are advocated by those who favour Bayesian statistics over traditional, classical or "frequentist" approaches. Applied econometrics uses theoretical econometrics and real-world data for assessing economic theories, developing econometric models, analysing economic history, forecasting. Econometrics may use standard statistical models to study economic questions, but most they are with observational data, rather than in controlled experiments. In this, the design of observational studies in econometrics is similar to the design of studies in other observational disciplines, such as astronomy, epidemiology and political science. Analysis of data from an observational study is guided by the study protocol, although exploratory data analysis may be useful for generating new hypotheses. Economics analyses systems of equations and inequalities, such as supply and demand hypothesized to be in equilibrium; the field of econometrics has developed methods for identification and estimation of simultaneous-equation models.
These methods are analogous to methods used in other areas of science, such as the field of system identification in systems analysis and control theory. Such methods may allow researchers to estimate models and investigate their empirical consequences, without directly manipulating the system. One of the fundamental statistical methods used by econometricians is regression analysis. Regression methods are important i
Time series
A time series is a series of data points indexed in time order. Most a time series is a sequence taken at successive spaced points in time, thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, the daily closing value of the Dow Jones Industrial Average. Time series are frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, communications engineering, in any domain of applied science and engineering which involves temporal measurements. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on observed values. While regression analysis is employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series or multiple dependent time series at different points in time.
Interrupted time series analysis is the analysis of interventions on a single time series. Time series data have a natural temporal ordering; this makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations. Time series analysis is distinct from spatial data analysis where the observations relate to geographical locations. A stochastic model for a time series will reflect the fact that observations close together in time will be more related than observations further apart. In addition, time series models will make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data. Methods for time series analysis may be divided into two classes: frequency-domain methods and time-domain methods; the former include wavelet analysis.
In the time domain and analysis can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in the frequency domain. Additionally, time series analysis techniques may be divided into parametric and non-parametric methods; the parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters. In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Methods of time series analysis may be divided into linear and non-linear, univariate and multivariate. A time series is one type of panel data. Panel data is the general class, a multidimensional data set, whereas a time series data set is a one-dimensional panel. A data set may exhibit characteristics of both panel data and time series data.
One way to tell is to ask. If the answer is the time data field this is a time series data set candidate. If determining a unique record requires a time data field and an additional identifier, unrelated to time it is panel data candidate. If the differentiation lies on the non-time identifier the data set is a cross-sectional data set candidate. There are several types of motivation and data analysis available for time series which are appropriate for different purposes and etc. In the context of statistics, quantitative finance, seismology and geophysics the primary goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication engineering it is used for signal detection and estimation, while in the context of data mining, pattern recognition and machine learning time series analysis can be used for clustering, query by content, anomaly detection as well as forecasting; the clearest way to examine a regular time series manually is with a line chart such as the one shown for tuberculosis in the United States, made with a spreadsheet program.
The number of cases was standardized to a rate per 100,000 and the percent change per year in this rate was calculated. The nearly dropping line shows that the TB incidence was decreasing in most years, but the percent change in this rate varied by as much as +/- 10%, with'surges' in 1975 and around the early 1990s; the use of both vertical axes allows the comparison of two time series in one graphic. Other techniques include: Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behavior which need not be related to seasonality. For example, sun spot activity vari
Statistics
Statistics is a branch of mathematics dealing with data collection, analysis and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics; when census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements.
In contrast, an observational study does not involve experimental manipulation. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, inferential statistics, which draw conclusions from data that are subject to random variation. Descriptive statistics are most concerned with two sets of properties of a distribution: central tendency seeks to characterize the distribution's central or typical value, while dispersion characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets.
Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors and Type II errors. Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are subject to error. Many of these errors are classified as random or systematic, but other types of errors can be important; the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not until the 18th century that it started to draw more from calculus and probability theory. In more recent years statistics has relied more on statistical software to produce tests such as descriptive analysis.
Some definitions are: Merriam-Webster dictionary defines statistics as "a branch of mathematics dealing with the collection, analysis and presentation of masses of numerical data." Statistician Arthur Lyon Bowley defines statistics as "Numerical statements of facts in any department of inquiry placed in relation to each other."Statistics is a mathematical body of science that pertains to the collection, interpretation or explanation, presentation of data, or as a branch of mathematics. Some consider statistics to be a distinct mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty and decision making in the face of uncertainty. Mathematical statistics is the application of mathematics to statistics. Mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, measure-theoretic probability theory.
In applying statistics to a problem, it is common practice to start with a population or process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Ideally, statisticians compile data about the entire population; this may be organized by governmental statistical institutes. Descriptive statistics can be used to summarize the population data. Numerical descriptors include mean and standard deviation for continuous data types, while frequency and percentage are more useful in terms of describing categorical data; when a census is not feasible, a chosen subset of the population called. Once a sample, representative of the population is determined, data is collected for the sample members in an observational or experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, the drawing of the sample has been subject to an element of randomness, hence the established numerical descriptors from the sample are due to uncertainty.
To still draw meaningful conclusions about the entire population, in
Convolution
In mathematics convolution is a mathematical operation on two functions to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to the process of computing it; some features of convolution are similar to cross-correlation: for real-valued functions, of a continuous or discrete variable, it differs from cross-correlation only in that either f or g is reflected about the y-axis. For continuous functions, the cross-correlation operator is the adjoint of the convolution operator. Convolution has applications that include probability, computer vision, natural language processing and signal processing and differential equations; the convolution can be defined for functions on Euclidean space, other groups. For example, periodic functions, such as the discrete-time Fourier transform, can be defined on a circle and convolved by periodic convolution. A discrete convolution can be defined for functions on the set of integers. Generalizations of convolution have applications in the field of numerical analysis and numerical linear algebra, in the design and implementation of finite impulse response filters in signal processing.
Computing the inverse of the convolution operation is known as deconvolution. The convolution of f and g is written f ∗ g, using an star, it is defined as the integral of the product of the two functions after one is shifted. As such, it is a particular kind of integral transform: An equivalent definition is: ≜ ∫ − ∞ ∞ f g d τ. While the symbol t is used above, it need not represent the time domain, but in that context, the convolution formula can be described as a weighted average of the function f at the moment t where the weighting is given by g shifted by amount t. As t changes, the weighting function emphasizes different parts of the input function. For functions f, g supported on only [0, ∞), the integration limits can be truncated, resulting in: = ∫ 0 t f g d τ for f, g: [ 0, ∞ ) → R. For the multi-dimensional formulation of convolution, see domain of definition. A common engineering convention is: f ∗ g ≜ ∫ − ∞ ∞ f g d τ ⏟, which has to be interpreted to avoid confusion. For instance, f ∗ g is equivalent to.
Convolution describes the output of an important class of operations known as linear time-invariant. See LTI system theory for a derivation of convolution as the result of LTI constraints. In terms of the Fourier transforms of the input and output of an LTI operation, no new frequency components are created; the existing ones are only modified. In other words, the output transform is the pointwise product of the input transform with a third transform. See Convolution theorem for a derivation of that property of convolution. Conversely, convolution can be derived as the inverse Fourier transform of the pointwise product of two Fourier transforms. One of the earliest uses of the convolution integral appeared in D'Alembert's derivation of Taylor's theorem in Recherches sur différents points importants du système du monde, published in 1754. An expression of the type: ∫ f ⋅ g d u is used by Sylvestre François Lacroix on page 505 of his book entitled Treatise on differences and series, the last of 3 volumes of the encyclopedic series: Traité du calcul différentiel et du calcul intégral, Chez Courcier, Paris, 1797–1800.
Soon thereafter, convolution operations appear in the works of Pierre Simon Laplace, Jean-Baptiste Joseph Fourier, Siméon Denis Poisson, others. The term itself did not come into wide use until the 60s. Prior to that it was sometimes known as Faltung, composition product, superposition integral, Carson's integral, yet it appears as early as 1903. The o
Single particle analysis
Single particle analysis is a group of related computerized image processing techniques used to analyze images from transmission electron microscopy. These methods were developed to improve and extend the information obtainable from TEM images of particulate samples proteins or other large biological entities such as viruses. Individual images of stained or unstained particles are noisy, so hard to interpret. Combining several digitized images of similar particles together gives an image with stronger and more interpretable features. An extension of this technique uses single particle methods to build up a three-dimensional reconstruction of the particle. Using cryo-electron microscopy it has become possible to generate reconstructions with sub-nanometer resolution and near-atomic resolution first in the case of symmetric viruses, now in smaller, asymmetric proteins as well. Single particle analysis can be done on both negatively stained and vitreous ice-embedded CryoTEM samples. Single particle analysis methods are, in general, reliant on the sample being homogeneous, although techniques for dealing with conformational heterogeneity are being developed.
Images collected on film are digitized using high-quality scanners, although electron microscopes have built-in CCD detectors coupled to a phosphorescent layer. The image processing is carried out using specialized software programs run on multi-processor computer clusters. Depending on the sample or the desired results, various steps of two- or three-dimensional processing can be done. Biological samples, samples embedded in thin vitreous ice, are radiation sensitive, thus only low electron doses can be used to image the sample; this low dose, as well as variations in the metal stain used means images have high noise relative to the signal given by the particle being observed. By aligning several similar images to each other so they are in register and averaging them, an image with higher signal to noise ratio can be obtained; as the noise is randomly distributed and the underlying image features constant, by averaging the intensity of each pixel over several images only the constant features are reinforced.
The optimal alignment to map one image onto another is calculated by cross-correlation. However, a micrograph contains particles in multiple different orientations and/or conformations, so to get more representative image averages, a method is required to group similar particle images together into multiple sets; this is carried out using one of several data analysis and image classification algorithms, such as multi-variate statistical analysis and hierarchical ascendant classification, or k-means clustering. Data sets of tens of thousands of particle images are used, to reach an optimal solution an iterative procedure of alignment and classification is used, whereby strong image averages produced by classification are used as reference images for a subsequent alignment of the whole data set. Image filtering is used to reduce the influence of high and/or low spatial frequency information in the images, which can affect the results of the alignment and classification procedures; this is useful in negative stain images.
The algorithms make use of fast Fourier transforms employing gaussian shaped soft-edged masks in reciprocal space to suppress certain frequency ranges. High-pass filters remove low spatial frequencies. Low-pass filters have a blurring effect on fine details. Due to the nature of image formation in the electron microscope, bright-field TEM images are obtained using significant underfocus. This, along with features inherent in the microscope's lens system, creates blurring of the collected images visible as a point spread function; the combined effects of the imaging conditions are known as the contrast transfer function, can be approximated mathematically as a function in reciprocal space. Specialized image processing techniques such as phase flipping and amplitude correction/wiener filtering can correct for the CTF, allow high resolution reconstructions. Transmission electron microscopy images are projections of the object showing the distribution of density through the object, similar to medical X-rays.
By making use of the projection-slice theorem a three-dimensional reconstruction of the object can be generated by combining many images of the object taken from a range of viewing angles. Proteins in vitreous ice ideally adopt a random distribution of orientations, allowing a isotropic reconstruction if a large number of particle images are used; this contrasts with electron tomography, where the viewing angles are limited due to the geometry of the sample/imaging set up, giving an anisotropic reconstruction. Filtered back projection is a used method of generating 3D reconstructions in single particle analysis, although many alternative algorithms exist. Before a reconstruction can be made, the orientation of the object in each image needs to be estimated. Several methods have been developed to work out the relative Euler angles of each image; some are based on common lines, others use iterative projection matching algorithms. The latter works by beginning with a simple, low resolution 3D starting model and compares the experimental images to projections of the model and creates a new 3D to bootstrap towards a solution.
Methods are available for making 3D reconstructions of helical samples (such as tobacc
Stochastic process
In probability theory and related fields, a stochastic or random process is a mathematical object defined as a collection of random variables. The random variables were associated with or indexed by a set of numbers viewed as points in time, giving the interpretation of a stochastic process representing numerical values of some system randomly changing over time, such as the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes are used as mathematical models of systems and phenomena that appear to vary in a random manner, they have applications in many disciplines including sciences such as biology, ecology and physics as well as technology and engineering fields such as image processing, signal processing, information theory, computer science and telecommunications. Furthermore random changes in financial markets have motivated the extensive use of stochastic processes in finance. Applications and the study of phenomena have in turn inspired the proposal of new stochastic processes.
Examples of such stochastic processes include the Wiener process or Brownian motion process, used by Louis Bachelier to study price changes on the Paris Bourse, the Poisson process, used by A. K. Erlang to study the number of phone calls occurring in a certain period of time; these two stochastic processes are considered the most important and central in the theory of stochastic processes, were discovered and independently, both before and after Bachelier and Erlang, in different settings and countries. The term random function is used to refer to a stochastic or random process, because a stochastic process can be interpreted as a random element in a function space; the terms stochastic process and random process are used interchangeably with no specific mathematical space for the set that indexes the random variables. But these two terms are used when the random variables are indexed by the integers or an interval of the real line. If the random variables are indexed by the Cartesian plane or some higher-dimensional Euclidean space the collection of random variables is called a random field instead.
The values of a stochastic process are not always numbers and can be vectors or other mathematical objects. Based on their mathematical properties, stochastic processes can be divided into various categories, which include random walks, Markov processes, Lévy processes, Gaussian processes, random fields, renewal processes, branching processes; the study of stochastic processes uses mathematical knowledge and techniques from probability, linear algebra, set theory, topology as well as branches of mathematical analysis such as real analysis, measure theory, Fourier analysis, functional analysis. The theory of stochastic processes is considered to be an important contribution to mathematics and it continues to be an active topic of research for both theoretical reasons and applications. A stochastic or random process can be defined as a collection of random variables, indexed by some mathematical set, meaning that each random variable of the stochastic process is uniquely associated with an element in the set.
The set used to index. The index set was some subset of the real line, such as the natural numbers, giving the index set the interpretation of time; each random variable in the collection takes values from the same mathematical space known as the state space. This state space can be, for example, the integers, the real n - dimensional Euclidean space. An increment is the amount that a stochastic process changes between two index values interpreted as two points in time. A stochastic process can have many outcomes, due to its randomness, a single outcome of a stochastic process is called, among other names, a sample function or realization. A stochastic process can be classified in different ways, for example, by its state space, its index set, or the dependence among the random variables. One common way of classification is by the cardinality of the state space; when interpreted as time, if the index set of a stochastic process has a finite or countable number of elements, such as a finite set of numbers, the set of integers, or the natural numbers the stochastic process is said to be in discrete time.
If the index set is some interval of the real line time is said to be continuous. The two types of stochastic processes are referred to as discrete-time and continuous-time stochastic processes. Discrete-time stochastic processes are considered easier to study because continuous-time processes require more advanced mathematical techniques and knowledge due to the index set being uncountable. If the index set is the integers, or some subset of them the stochastic process can be called a random sequence. If the state space is the integers or natural numbers the stochastic process is called a discrete or integer-valued stochastic process. If the state space is the real line the stochastic process is referred to as a real-valued stochastic process or a process with continuous state space. If the state space is n -dimensional Euclidean space the stochastic process is called a n -dimensional vector process or n -vector process; the word stochastic in English was used as an adjective with the definition "pertaining to conjecturing", stemming from a Greek word meaning "to aim at a mark, guess", the Oxford English Dictionary gives the year 16
Cryptanalysis
Cryptanalysis is the study of analyzing information systems in order to study the hidden aspects of the systems. Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages if the cryptographic key is unknown. In addition to mathematical analysis of cryptographic algorithms, cryptanalysis includes the study of side-channel attacks that do not target weaknesses in the cryptographic algorithms themselves, but instead exploit weaknesses in their implementation. Though the goal has been the same, the methods and techniques of cryptanalysis have changed drastically through the history of cryptography, adapting to increasing cryptographic complexity, ranging from the pen-and-paper methods of the past, through machines like the British Bombes and Colossus computers at Bletchley Park in World War II, to the mathematically advanced computerized schemes of the present. Methods for breaking modern cryptosystems involve solving constructed problems in pure mathematics, the best-known being integer factorization.
Given some encrypted data, the goal of the cryptanalyst is to gain as much information as possible about the original, unencrypted data. It is useful to consider two aspects of achieving this; the first is breaking the system —, discovering how the encipherment process works. The second is solving the key, unique for a particular encrypted message or group of messages. Attacks can be classified based on; as a basic starting point it is assumed that, for the purposes of analysis, the general algorithm is known. This is a reasonable assumption in practice — throughout history, there are countless examples of secret algorithms falling into wider knowledge, variously through espionage and reverse engineering.: Ciphertext-only: the cryptanalyst has access only to a collection of ciphertexts or codetexts. Known-plaintext: the attacker has a set of ciphertexts to which he knows the corresponding plaintext. Chosen-plaintext: the attacker can obtain the ciphertexts corresponding to an arbitrary set of plaintexts of his own choosing.
Adaptive chosen-plaintext: like a chosen-plaintext attack, except the attacker can choose subsequent plaintexts based on information learned from previous encryptions. Adaptive chosen ciphertext attack. Related-key attack: Like a chosen-plaintext attack, except the attacker can obtain ciphertexts encrypted under two different keys; the keys are unknown. Attacks can be characterised by the resources they require; those resources include: Time -- the number of computation steps. Memory — the amount of storage required to perform the attack. Data — the quantity and type of plaintexts and ciphertexts required for a particular approach. It's sometimes difficult to predict these quantities especially when the attack isn't practical to implement for testing, but academic cryptanalysts tend to provide at least the estimated order of magnitude of their attacks' difficulty, for example, "SHA-1 collisions now 252."Bruce Schneier notes that computationally impractical attacks can be considered breaks: "Breaking a cipher means finding a weakness in the cipher that can be exploited with a complexity less than brute force.
Never mind that brute-force might require 2128 encryptions. The results of cryptanalysis can vary in usefulness. For example, cryptographer Lars Knudsen classified various types of attack on block ciphers according to the amount and quality of secret information, discovered: Total break — the attacker deduces the secret key. Global deduction — the attacker discovers a functionally equivalent algorithm for encryption and decryption, but without learning the key. Instance deduction — the attacker discovers additional plaintexts not known. Information deduction — the attacker gains some Shannon information about plaintexts not known. Distinguishing algorithm — the attacker can distinguish the cipher from a random permutation. Academic attacks are against weakened versions of a cryptosystem, such as a block cipher or hash function with some rounds removed. Many, but not all, attacks become exponentially more difficult to execute as rounds are added to a cryptosystem, so it's possible for the full cryptosystem to be strong though reduced-round variants are weak.
Nonetheless, partial breaks that come close to breaking the original cryptosystem may mean that a full break will follow. In academic cryptography, a weakness or a break in a scheme is defined quite conservatively: it might require impractical amounts of time, memory, or known plaintexts, it might require the attacker be able to do things many real-world attackers can't: for example, the attacker may need to choose particular plaintexts to be encrypted or to ask for plaintexts to be encrypted using several keys related to the secret key. Furthermore