cl.cam.ac.uk C/C++11 mappings to processors
For each C/C++11 synchronization operation and architecture, the document aims to provide an instruction sequence that implements the operation on given architecture. This is not the only approach — one could provide a mapping that shows the necessary barriers (or other synchronization mechanism) between two program-order adjacent memory operations (either atomic or non-atomic). A good example of this approach is Doug Lea's cookbook for JVM compiler writers. While that approach can result in higher-performance mappings, we do not use it here because the resulting tables would be large and we have not investigated correct mappings for all the combinations. The per-operation approach that we take here would benefit from an optimisation pass that removes redundant synchronisation between adjacent operations.
Architectures
x86 (including x86-64)
| C/C++11 Operation | x86 implementation |
|---|---|
| Load Relaxed: | MOV (from memory) |
| Load Consume: | MOV (from memory) |
| Load Acquire: | MOV (from memory) |
| Load Seq_Cst: | MOV (from memory) |
| Store Relaxed: | MOV (into memory) |
| Store Release: | MOV (into memory) |
| Store Seq Cst: | (LOCK) XCHG // alternative: MOV (into memory),MFENCE |
| Consume Fence: | <ignore> |
| Acquire Fence: | <ignore> |
| Release Fence: | <ignore> |
| Acq_Rel Fence: | <ignore> |
| Seq_Cst Fence: | MFENCE |
NOTE: load对应的read、store对应的是write