1.
Kd-tree
–
In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-d trees are a data structure for several applications, such as searches involving a multidimensional search key. K-d trees are a case of binary space partitioning trees. The k-d tree is a tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the subtree of that node. The hyperplane direction is chosen in the way, every node in the tree is associated with one of the k-dimensions. In such a case, the hyperplane would be set by the x-value of the point, since there are many possible ways to choose axis-aligned splitting planes, there are many different ways to construct k-d trees. The canonical method of k-d tree construction has the following constraints, As one moves down the tree, points are inserted by selecting the median of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane. This method leads to a balanced k-d tree, in each leaf node is approximately the same distance from the root. However, balanced trees are not necessarily optimal for all applications, note that it is not required to select the median point. In the case where median points are not selected, there is no guarantee that the tree will be balanced, in practice, this technique often results in nicely balanced trees. Given a list of n points, the algorithm uses a median-finding sort to construct a balanced k-d tree containing those points. Function kdtree It is common that points after the median include only the ones that are greater than the median. For points that lie on the median, it is possible to define a function that compares the points in all dimensions. In some cases, it is acceptable to let points equal to the lie on one side of the median, for example, by splitting the points into a lesser than subset. This algorithm creates the invariant that for any node, all the nodes in the left subtree are on one side of a splitting plane, points that lie on the splitting plane may appear on either side. The splitting plane of a node goes through the point associated with that node, alternative algorithms for building a balanced k-d tree presort the data prior to building the tree

2.
Locality Sensitive Hashing
–
Locality-sensitive hashing reduces the dimensionality of high-dimensional data. LSH hashes input items so that similar items map to the same “buckets” with high probability, LSH differs from conventional and cryptographic hash functions because it aims to maximize the probability of a “collision” for similar items. Locality-sensitive hashing has much in common with data clustering and nearest neighbor search, an LSH family F is defined for a metric space M =, a threshold R >0 and an approximation factor c >1. This family F is a family of functions h, M → S which map elements from the space to a bucket s ∈ S. A family is interesting when P1 > P2, such a family F is called -sensitive. Alternatively it is defined with respect to a universe of items U that have a similarity function ϕ, U × U →, given a -sensitive family F, we can construct new families G by either the AND-construction or OR-construction of F. To create an AND-construction, we define a new family G of hash functions g and we then say that for a hash function g ∈ G, g = g if and only if all h i = h i for i =1,2. Since the members of F are independently chosen for any g ∈ G, G is a -sensitive family, to create an OR-construction, we define a new family G of hash functions g, where each function g is constructed from k random functions h 1. We then say that for a function g ∈ G, g = g if. Since the members of F are independently chosen for any g ∈ G, G is a -sensitive family and this approach works for the Hamming distance over d-dimensional vectors d. Here, the family F of hash functions is simply the family of all the projections of points on one of the d coordinates, i. e. F =, a random function h from F simply selects a random bit from the input point. This family has the following parameters, P1 =1 − R / d, P2 =1 − c R / d, suppose U is composed of subsets of some ground set of enumerable items S and the similarity function of interest is the Jaccard index J. If π is a permutation on the indices of S, for A ⊆ S let h = min a ∈ A. Each possible choice of π defines a single hash function h mapping input sets to elements of S. Define the function family H to be the set of all such functions and let D be the uniform distribution. Given two sets A, B ⊆ S the event that h = h corresponds exactly to the event that the minimizer of π over A ∪ B lies inside A ∩ B. As h was chosen uniformly at random, P r = J, because the symmetric group on n elements has size n. choosing a truly random permutation from the full symmetric group is infeasible for even moderately sized n. It has been established that an independent family of permutations is at least of size lcm ≥ e n − o. Restricted min-wise independence is the independence property restricted to certain sets of cardinality at most k

3.
Algorithm
–
In mathematics and computer science, an algorithm is a self-contained sequence of actions to be performed. Algorithms can perform calculation, data processing and automated reasoning tasks, an algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. The transition from one state to the next is not necessarily deterministic, some algorithms, known as randomized algorithms, giving a formal definition of algorithms, corresponding to the intuitive notion, remains a challenging problem. In English, it was first used in about 1230 and then by Chaucer in 1391, English adopted the French term, but it wasnt until the late 19th century that algorithm took on the meaning that it has in modern English. Another early use of the word is from 1240, in a manual titled Carmen de Algorismo composed by Alexandre de Villedieu and it begins thus, Haec algorismus ars praesens dicitur, in qua / Talibus Indorum fruimur bis quinque figuris. Which translates as, Algorism is the art by which at present we use those Indian figures, the poem is a few hundred lines long and summarizes the art of calculating with the new style of Indian dice, or Talibus Indorum, or Hindu numerals. An informal definition could be a set of rules that precisely defines a sequence of operations, which would include all computer programs, including programs that do not perform numeric calculations. Generally, a program is only an algorithm if it stops eventually, but humans can do something equally useful, in the case of certain enumerably infinite sets, They can give explicit instructions for determining the nth member of the set, for arbitrary finite n. An enumerably infinite set is one whose elements can be put into one-to-one correspondence with the integers, the concept of algorithm is also used to define the notion of decidability. That notion is central for explaining how formal systems come into being starting from a set of axioms. In logic, the time that an algorithm requires to complete cannot be measured, from such uncertainties, that characterize ongoing work, stems the unavailability of a definition of algorithm that suits both concrete and abstract usage of the term. Algorithms are essential to the way computers process data, thus, an algorithm can be considered to be any sequence of operations that can be simulated by a Turing-complete system. Although this may seem extreme, the arguments, in its favor are hard to refute. Gurevich. Turings informal argument in favor of his thesis justifies a stronger thesis, according to Savage, an algorithm is a computational process defined by a Turing machine. Typically, when an algorithm is associated with processing information, data can be read from a source, written to an output device. Stored data are regarded as part of the state of the entity performing the algorithm. In practice, the state is stored in one or more data structures, for some such computational process, the algorithm must be rigorously defined, specified in the way it applies in all possible circumstances that could arise. That is, any conditional steps must be dealt with, case-by-case

4.
Data structure
–
In computer science, a data structure is a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more abstract data types, which specify the operations that can be performed on a data structure. In comparison, a structure is a concrete implementation of the specification provided by an ADT. Different kinds of structures are suited to different kinds of applications. For example, relational databases commonly use B-tree indexes for data retrieval, Data structures provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, efficient data structures are key to designing efficient algorithms, some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design. Data structures can be used to organize the storage and retrieval of stored in both main memory and secondary memory. Many data structures use both principles, sometimes combined in non-trivial ways, the implementation of a data structure usually requires writing a set of procedures that create and manipulate instances of that structure. The efficiency of a data structure cannot be analyzed separately from those operations, there are numerous types of data structures, generally built upon simpler primitive data types, An array is a number of elements in a specific order, typically all of the same type. Elements are accessed using an index to specify which element is required. Typical implementations allocate contiguous memory words for the elements of arrays, arrays may be fixed-length or resizable. A linked list is a collection of data elements of any type, called nodes, where each node has itself a value. The principal advantage of a linked list over an array, is that values can always be efficiently inserted and removed without relocating the rest of the list, certain other operations, such as random access to a certain element, are however slower on lists than on arrays. A record is a data structure. A record is a value that contains other values, typically in fixed number and sequence, the elements of records are usually called fields or members. A union is a structure that specifies which of a number of permitted primitive types may be stored in its instances. Contrast with a record, which could be defined to contain a float, enough space is allocated to contain the widest member datatype. A tagged union contains an additional field indicating its current type, a class is a data structure that contains data fields, like a record, as well as various methods which operate on the contents of the record