Skip to content

关于本章

本章描述排序算法。

常见的排序算法非常多,因此我们需要对各种排序算法进行非常好的分类、梳理它们的关联,参考了下面的文章:

1、wikipedia Sorting algorithm

2、wanweibaike Priority queue

排序算法概述

一、wikipedia Sorting algorithm

Sorting algorithms 简介
Exchange sorts Bubble sortQuicksort 选择pivot为基准,进行交换
Selection sorts Selection sortHeapsortSmoothsort 每次选择最大、最小
Insertion sorts Insertion sortShellsort 将未排序元素插入到已排序的部分中
Merge sorts Merge sort 将已排序的序列合并为一个更大的排序序列
Distribution sorts Bucket sortCounting sortRadix sort 非比较排序
Concurrent sorts
Hybrid sorts TimsortIntrosort 综合各种排序,选择最优的
Other Topological sorting

NOTE:

原文收录了很多,上面仅仅包含一些经常被提及的。

二、wanweibaike Priority queue

Name Priority Queue Implementation Best Average Worst
Heapsort Heap n \log(n) n \log(n) n \log(n)
Smoothsort Leonardo Heap n n \log(n) n \log (n)
Selection sort Unordered Array n^2 n^2 n^2
Insertion Sort Ordered Array n n^2 n^2
Tree sort self-balancing binary search tree n \log(n) n \log(n) n \log(n)

wikipedia Sorting algorithm

In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most frequently used orders are numerical order and lexicographical order.

NOTE:

可供选择的order是非常多的

Efficient sorting is important for optimizing the efficiency of other algorithms (such as search and merge algorithms) that require input data to be in sorted lists. Sorting is also often useful for canonicalizing(标准化) data and for producing human-readable output. More formally, the output of any sorting algorithm must satisfy two conditions:

1、The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order);

2、The output is a permutation (a reordering, yet retaining all of the original elements) of the input.

Further, the input data is often stored in an array, which allows random access, rather than a list, which only allows sequential access; though many algorithms can be applied to either type of data after suitable modification.

NOTE:

对数组排序和对链表排序,采用的算法可能不同

History

Bubble sort (1956)

Timsort (2002)

library sort (2006)

1、Comparison sorting algorithms have a fundamental requirement of Ω(n log n) comparisons (some input sequences will require a multiple of n log n comparisons, where n is the number of elements in the array to be sorted).

2、Algorithms not based on comparisons, such as counting sort, can have better performance.

Asymptotically optimal algorithms have been known since the mid-20th century—useful new algorithms are still being invented, with the now widely used Timsort dating to 2002, and the library sort being first published in 2006.

Classification

Sorting algorithms are often classified by:

1、Computational complexity (worst, average and best behavior) in terms of the size of the list (n).

For typical serial sorting algorithms good behavior is O(n log {n}), with parallel sort in O(log^2n), and bad behavior is O(n^2). (See Big O notation.)

Ideal behavior for a serial sort is O(n), but this is not possible in the average case.

Optimal parallel sorting is O(log n).

Comparison-based sorting algorithms need at least Ω(n log n) comparisons for most inputs.

2、Computational complexity of swaps (for "in-place" algorithms).

3、Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in-place". Strictly, an in-place sort needs only O(1) memory beyond the items being sorted; sometimes O(log(n)) additional memory is considered "in-place".

4、Recursion. Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort).

5、Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values).

6、Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator.

6、General method: insertion, exchange, selection, merging, etc.

Exchange sorts include bubble sort and quicksort.

Selection sorts include cycle sort and heapsort.

6、Whether the algorithm is serial or parallel. The remainder of this discussion almost exclusively concentrates upon serial algorithms and assumes serial operation.

7、Adaptability: Whether or not the presortedness of the input affects the running time. Algorithms that take this into account are known to be adaptive.

Stability

NOTE:

原文花了很大的篇幅来介绍stability,其实它的含义是比较好理解的,在下面文章中,有着更加精简的论述:

1、cnblogs 【DS】排序算法的稳定性

考察排序算法的时候有一个很重要的特性,就是算法的稳定性:假定在待排序的记录序列中,存在多个具有相同的关键字的记录,若经过排序,这些记录的相对次序保持不变,即在原序列中,ri=rj,且ri在rj之前,而在排序后的序列中,ri仍在rj之前,则称这种排序算法是稳定的;否则称为不稳定的。

二、算法稳定性的重要性

算法稳定性为什么这么重要呢?

1)在实际的应用中,我们交换的不一定只是一个整数,而可能是一个很大的对象,交换元素存在一定的开销;

2)参照基数排序(后面会讲),不稳定排序是无法完成基数排序的,讲述完基数排序后,还会补充这里的原因。

Comparison of algorithms

Below is a table of comparison sorts. A comparison sort cannot perform better than O(n log n) on average.

Name Best Average Worst Memory Stable Method Other notes
Quicksort $ n\log n $ $ n\log n $ $ n^{2} $ $ \log n $ No Partitioning Quicksort is usually done in-place with O(log n) stack space
Merge sort $ n\log n $ $ n\log n $ $ n\log n $ n Yes Merging Highly parallelizable (up to O(log n) using the Three Hungarians' Algorithm)
In-place merge sort $ n\log ^{2}n $ 1 Yes Merging Can be implemented as a stable sort based on stable in-place merging.
Introsort $ n\log n $ $ n\log n $ $ n\log n $ $ \log n $ No Partitioning & Selection Used in several STL implementations.
Heapsort $ n\log n $ $ n\log n $ $ n\log n $ 1 No Selection
Insertion sort n $ n^{2} $ $ n^{2} $ 1 Yes Insertion O(n + d), in the worst case over sequences that have d inversions.
Timsort n $ n\log n $ $ n\log n $ n Yes Insertion & Merging Makes n comparisons when the data is already sorted or reverse sorted.
Selection sort $ n^{2} $ $ n^{2} $ $ n^{2} $ 1 No Selection Stable with $ O(n)} extra space or when using linked lists
Shellsort $ n\log n $ $ n^{4/3} $ $ n^{3/2} $ 1 No Insertion Small code size.
Bubble sort n $ n^{2} $ $ n^{2} $ 1 Yes Exchanging Tiny code size.
Tree sort $ n\log n $ $ n\log n $ $ n\log n $(balanced) n Yes Insertion When using a self-balancing binary search tree.
Library sort $ n\log n $ $ n\log n $ $ n^{2} $ n No Insertion
Tournament sort $ n\log n $ $ n\log n $ $ n\log n $ n[12] No Selection Variation of Heap Sort.

NOTE:

IntrosortTimsort 是 综合两种排序方式

Insertion sort

Non-comparison sorts

Name Best Average Worst Memory Stable n ≪ 2*k* Notes
Bucket sort (uniform keys) $ n+k $ n^{2}\cdot k $ n\cdot k$ Yes No Assumes uniform distribution of elements from the domain in the array. Also cannot sort non-integers
Bucket sort (integer keys) n+r n+r n+r Yes Yes If r is O(n), then average time complexity is O(n).
Counting sort n+r n+r n+r Yes Yes If r is O(n), then average time complexity is O(n).
LSD Radix Sort $n} n\cdot {\frac {k}{d}} n\cdot {\frac {k}{d}} n+2^{d} Yes No
MSD Radix Sort n\cdot {\frac {k}{d}} n\cdot {\frac {k}{d}} n+2^{d} Yes No
MSD Radix Sort (in-place) n\cdot {\frac {k}{1}} n\cdot {\frac {k}{1}} 2^{1} No No

Popular sorting algorithms

NOTE:

一、下面对各种排序算法的梳理是非常好的

1、各种算法之间的关联

Highly tuned implementations use more sophisticated variants, such as Timsort (merge sort, insertion sort, and additional logic), used in Android, Java, and Python, and introsort (quicksort and heap sort), used (in variant forms) in some C++ sort implementations and in .NET.

Simple sorts

Insertion sort

Shellsort (see below) is a variant of insertion sort that is more efficient for larger lists.

Selection sort

Efficient sorts

Merge sort

Heapsort

Heapsort is a much more efficient version of selection sort.

Quicksort

Shell sort

It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time.

Bubble sort and variants

Bubble sort

Shell sort

It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time.

Distribution sort

See also: External sorting

Distribution sort refers to any sorting algorithm where data are distributed from their input to multiple intermediate structures which are then gathered and placed on the output. For example, both bucket sort and flashsort are distribution based sorting algorithms. Distribution sorting algorithms can be used on a single processor, or they can be a distributed algorithm, where individual subsets are separately sorted on different processors, then combined. This allows external sorting of data too large to fit into a single computer's memory.

NOTE:

divide and conquer

Counting sort

Bucket sort

Radix sort

Timsort vs quicksort

stackoverflow Comparison between timsort and quicksort