wikipedia Parallel computing

NOTE:

“Parallel computing”即“并行计算”。

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time.

NOTE: 上述**Parallel computing**的定义是非常宽泛的，按照这个定义，将一个大问题拆解为小问题，然后**同时**解决这些小问题的都是**parallel computing**。显然，非常多的computation形态都可以归为parallel computing，参见下面的Types of parallelism。

There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism.

NOTE: 对于software engineer而言，更多的是关注 task parallelism。关于分类的详细内容，参见Types of parallelism章节。

Parallelism has long been employed in high-performance computing, but it's gaining broader interest due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

NOTE: 时代背景

Parallel computing is closely related to concurrent computing—they are frequently used together, and often conflated(合并), though the two are distinct: it is possible to have parallelism without concurrency (such as bit-level parallelism), and concurrency without parallelism (such as multitasking by time-sharing on a single-core CPU). In parallel computing, a computational task is typically broken down into several, often many, very similar subtasks that can be processed independently and whose results are combined afterwards, upon completion. In contrast, in concurrent computing, the various processes often do not address related tasks; when they do, as is typical in distributed computing, the separate tasks may have a varied nature and often require some inter-process communication during execution.

NOTE：关于concurrent computing和parallelism之间的关联与差别，softwareengineering the-difference-between-concurrent-and-parallel-execution 这个回答比较好：

Concurrency and parallelism are two related but distinct concepts.

Concurrency means, essentially, that task A and task B both need to happen independently of each other, and A starts running, and then B starts before A is finished.

There are various different ways of accomplishing concurrency. One of them is parallelism--having multiple CPUs working on the different tasks at the same time. But that's not the only way. Another is by task switching, which works like this: Task A works up to a certain point, then the CPU working on it stops and switches over to task B, works on it for a while, and then switches back to task A. If the time slices are small enough, it may appear to the user that both things are being run in parallel, even though they're actually being processed in serial by a multitasking CPU.

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters, MPPs, and grids use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks.

In some cases parallelism is transparent to the programmer, such as in bit-level or instruction-level parallelism, but explicitly parallel algorithms, particularly those that use concurrency, are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically some of the greatest obstacles（障碍) to getting good parallel program performance.

A theoretical upper bound on the speed-up of a single program as a result of parallelization is given by Amdahl's law.

Background(时代背景)

To deal with the problem of power consumption and overheating the major central processing unit (CPU or processor) manufacturers started to produce power efficient processors with multiple cores. The core is the computing unit of the processor and in multi-core processors each core is independent and can access the same memory concurrently. Multi-core processors have brought parallel computing to desktop computers. Thus parallelisation of serial programmes has become a mainstream programming task.

总结：从Frequency scaling到parallel computing

Dependencies

Understanding data dependencies is fundamental in implementing parallel algorithms. No program can run more quickly than the longest chain of dependent calculations (known as the critical path), since calculations that depend upon prior calculations in the chain must be executed in order. However, most algorithms do not consist of just a long chain of dependent calculations; there are usually opportunities to execute independent calculations in parallel.

总结：上述critical path的概念是非常重要的；

Race conditions, mutual exclusion, synchronization, and parallel slowdown

NOTE: 这些内容在Unix-like-operating-system中已经包含了，此处省略。仅仅给出一些链接。

Race condition

Lock (computer science)

Non-blocking algorithm

Parallel slowdown

Barrier (computer science)

Semaphore (programming)

Synchronization (computer science)

Fine-grained, coarse-grained, and embarrassing parallelism

Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many times per second, and it exhibits embarrassing parallelism if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.

Consistency models

Main article: Consistency model

Parallel programming languages and parallel computers must have a consistency model (also known as a memory model). The consistency model defines rules for how operations on computer memory occur and how results are produced.

Types of parallelism

Bit-level parallelism

Main article: Bit-level parallelism

Instruction-level parallelism

Main article: Instruction-level parallelism

Task parallelism

Main article: Task parallelism

Hardware

Memory and communication

NOTE: 这一段内容与底层hardware关联，对于software engineer而言，可以跳过

Classes of parallel computers

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism. This classification is broadly analogous to the distance between basic computing nodes. These are not mutually exclusive; for example, clusters of symmetric multiprocessors are relatively common.

Multi-core computing

Main article: Multi-core processor

A multi-core processor is a processor that includes multiple processing units (called "cores") on the same chip.

Symmetric multiprocessing

Main article: Symmetric multiprocessing

A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a bus.

Distributed computing

Main article: Distributed computing

A distributed computer (also known as a distributed memory multiprocessor) is a distributed memory computer system in which the processing elements are connected by a network. Distributed computers are highly scalable. The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them. The same system may be characterized both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.

Cluster computing

Main article: Computer cluster

Massively parallel computing

Main article: Massively parallel (computing)

Grid computing

Main article: Grid computing

Specialized parallel computers

Within parallel computing, there are specialized parallel devices that remain niche areas of interest. While not domain-specific, they tend to be applicable to only a few classes of parallel problems.

Software

Parallel programming languages

Main article: List of concurrent and parallel programming languages

Algorithmic methods

Fault tolerance

Further information: Fault-tolerant computer system

As parallel computers become larger and faster, we are now able to solve problems that had previously taken too long to run. Fields as varied as bioinformatics (for protein folding and sequence analysis) and economics (for mathematical finance) have taken advantage of parallel computing. Common types of problems in parallel computing applications include:

Dense linear algebra
Sparse linear algebra
Spectral methods (such as Cooley–Tukey fast Fourier transform)
N-body problems (such as Barnes–Hut simulation)
structured grid problems (such as Lattice Boltzmann methods)
Unstructured grid problems (such as found in finite element analysis)
Monte Carlo method
Combinational logic (such as brute-force cryptographic techniques)
Graph traversal (such as sorting algorithms)
Dynamic programming
Branch and bound methods
Graphical models (such as detecting hidden Markov models and constructing Bayesian networks)
Finite-state machine simulation