wikipedia Search algorithm

NOTE:

一、关于search algorithm，维基百科 Search algorithm总结地不错，本文以它作为入门，然后对search algorithm进行总结，作为software engineer，我们需要关注的有：

原理

实现技巧

二、这篇文章对search algorithm的描述是非常好的，尤其是对问题的分类

三、其实本文描述的"search algorithm"是一个非常宽泛的概念，它囊括了非常多的algorithm，因为很多问题都可以看做是search

四、search space、state space、feasible region

In computer science, a search algorithm is any algorithm which solves the search problem, namely, to retrieve information stored within some data structure, or calculated in the search space（可行域、解空间） of a problem domain, either with discrete or continuous values. Specific applications of search algorithms include:

1、Problems in combinatorial optimization , such as:

1.1 The vehicle routing problem, a form of shortest path problem

NOTE: 汽车路径安排问题，这是一种“最短路径问题”

1.2 The knapsack problem: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

1.3 The nurse scheduling problem

2、Problems in constraint satisfaction, such as:

2.1 The map coloring problem

3、Filling in a sudoku or crossword puzzle

4、In game theory and especially combinatorial game theory, choosing the best move to make next (such as with the minmax algorithm)

5、Finding a combination or password from the whole set of possibilities

6、Factoring an integer (an important problem in cryptography)

NOTE: 分解一个integer

7、Optimizing an industrial process, such as a chemical reaction, by changing the parameters of the process (like temperature, pressure, and pH)

8、Retrieving a record from a database

9、Finding the maximum or minimum value in a list or array

10、Checking to see if a given value is present in a set of values

NOTE:

通过上面的描述可知，"search"是一个宽泛的概念，很多内容都可以归入其中。

The classic search problems described above and web search are both problems in information retrieval, but are generally studied as separate subfields and are solved and evaluated differently. are generally focused on filtering and that find documents most relevant to human queries. Classic search algorithms are typically evaluated on how fast they can find a solution, and whether or not that solution is guaranteed to be optimal. Though information retrieval algorithms must be fast, the quality of ranking is more important, as is whether or not good results have been left out and bad results included.

NOTE: search是一个非常大的概念

The appropriate search algorithm often depends on the data structure being searched, and may also include prior knowledge about the data. Some database structures are specially constructed to make search algorithms faster or more efficient, such as a search tree, hash map, or a database index. [1][2]-2)

NOTE: 选择合适的data structure组织数据来使search的time最优

Search algorithms can be classified based on their mechanism of searching. Linear search algorithms check every record for the one associated with a target key in a linear fashion.[3]-3) Binary, or half interval searches, repeatedly target the center of the search structure and divide the search space in half. Comparison search algorithms improve on linear searching by successively eliminating records based on comparisons of the keys until the target record is found, and can be applied on data structures with a defined order.[4]-4) Digital search algorithms work based on the properties of digits in data structures that use numerical keys.[5] Finally, hashing directly maps keys to records based on a hash function.[6] Searches outside a linear search require that the data be sorted in some way.

NOTE: “Search algorithms can be classified based on their mechanism of searching”，我们将“mechanism of searching”称为“method of search”，后面我们将看到“method of search”是非常之多的，绝不将就上述列举的几种

Algorithms are often evaluated by their computational complexity, or maximum theoretical run time. Binary search functions, for example, have a maximum complexity of O(log n), or logarithmic time. This means that the maximum number of operations needed to find the search target is a logarithmic function of the size of the search space.

NOTE: 最后一句提及的search space的概念非常重要。

Classes

NOTE : Search algorithm的一个主要问题就是确定search space，本文中search space所链接的文章所描述的其实是可行域，可行域的含义可能有些限制，但是也可以用；**search space**可能是virtual spaces（如backtracing在解空间中搜索）也可能是的确存在一个data structure（如二分搜索在一个sorted array中进行搜索）；正如下面会进行分类：

For virtual search spaces

For sub-structures of a given structure

Search for the maximum of a function

For virtual search spaces

Algorithms for searching virtual spaces are used in the constraint satisfaction problem, where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical equations and inequations / equalities. They are also used when the goal is to find a variable assignment that will maximize or minimize a certain function of those variables. Algorithms for these problems include the basic brute-force search(also called "naïve" or "uninformed" search), and a variety of heuristics that try to exploit partial knowledge about the structure of this space, such as linear relaxation, constraint generation, and constraint propagation.

NOTE: blind search VS heuristic search

Local search

NOTE:

这是一种非常重要的思想，是后续很多algorithm的核心思想

An important subclass are the local search methods, that view the elements of the search space as the vertices（点） of a graph, with edges defined by a set of heuristics(启发) applicable to the case; and scan the space by moving from item to item along the edges, for example according to the steepest descent or best-first criterion, or in a stochastic search. This category includes a great variety of general metaheuristic methods, such as simulated annealing, tabu search, A-teams, and genetic programming, that combine arbitrary heuristics in specific ways.

Tree search algorithms

This class also includes various tree search algorithms, that view the elements as vertices of a tree, and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as depth-first search and breadth-first search, as well as various heuristic-based search tree pruning methods such as backtracking and branch and bound. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time. This is called "completeness".

NOTE: graph search（tree search也包括在其中，因为tree是一种特殊的graph），后面会进行专门的介绍。graph search是非常重要，因为virtual search space也可以转换为graph search。

Game tree

Another important sub-class consists of algorithms for exploring the game tree of multiple-player games, such as chess or backgammon（双陆棋）, whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in robot guidance or in marketing, financial, or militarystrategy planning. This kind of problem — combinatorial search（组合搜索） — has been extensively studied in the context of artificial intelligence. Examples of algorithms for this class are the minimax algorithm, alpha–beta pruning, * Informational search [7] and the A* algorithm.

For sub-structures of a given structure

The name "combinatorial search"（组合搜索） is generally used for algorithms that look for a specific sub-structure of a given discrete structure, such as a graph, a string, a finite group, and so on. The term combinatorial optimization is typically used when the goal is to find a sub-structure with a maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented in the computer by a set of integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction or discrete optimization; but they are usually formulated and solved in a more abstract setting where the internal representation is not explicitly mentioned.)

NOTE: 这让我想起来Optimal substructure

Graph algorithm

An important and extensively studied subclass are the graph algorithms, in particular graph traversal algorithms, for finding specific sub-structures in a given graph — such as subgraphs, paths, circuits, and so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and Prim's algorithm.

String search algorithm

Another important subclass of this category are the string searching algorithms, that search for patterns within strings. Two famous examples are the Boyer–Moore and Knuth–Morris–Pratt algorithms, and several algorithms based on the suffix tree data structure.

Search for the maximum of a function

In 1953, American statistician Jack Kiefer devised Fibonacci search which can be used to find the maximum of a unimodal（单峰） function and has many other applications in computer science.

For quantum computers

There are also search methods designed for quantum computers, like Grover's algorithm, that are theoretically faster than linear or brute-force search even without the help of data structures or heuristics.