Skip to content

Interpretion model

1、 ISA serves as the interface between software and hardware.

2、在advanced programming language中有着严格的type system

3、那么advanced programming language中的type和机器指令之间的对应关系是什么呢?


在硬件级别,一切都是01,没有在高级programming language中的各种概念,比如type。在硬件级别,通过instruction来决定对01数据进行何种操作,比如:

1) **加法指令**决定了01数据进行加法运算

2) **浮点运算指令**决定了01数据进行浮点运算

在高级programming language中,有type的概念,compiler根据type来汇编生成instruction:

1) compiler根据type来决定对应的object的memory size,进而决定storage allocation

2) 根据type来决定使用何种运算指令


NOTE: 这是符合"type determine everything"观点的,参见Theory\Type-system章节。

上述分析非常重要 ,它是理解C,C++中pointer conversion、aliasing、type punning的基础,这个model非常重要,它可以帮助分析:

- object representation

- union

- aliasing

我将此作成为**interpretation model**。

wikipedia Type system 中关于"interpretion model"的内容

关于interpretation model,在wikipedia Type system中有着解释:

wikipedia Type system # Usage overview

A compiler may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the float data type, for example, is represented in 32 bits, in accord with the IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific microprocessor operations on those values (floating-point addition, multiplication, etc.).


一、这段话以float类型的数据为例来说明:编译器在汇编阶段可以根据value的static type来优化对该value的存储以及选择处理该value的指令,简而言之: compiler根据static type来决定:

1) storage

2) "the choice of algorithms for operations on the value",其实就是选择合适的instruction。

wikipedia Type system # Fundamentals

Assigning a data type, termed typing, gives meaning to a sequence of bits such as a value in memory or some object such as a variable. The hardware of a general purpose computer is unable to discriminate(区分、辨别) between for example a memory address and an instruction code, or between a character, an integer, or a floating-point number, because it makes no intrinsic distinction between any of the possible values that a sequence of bits might mean.[note 1] Associating a sequence of bits with a type conveys that meaning to the programmable hardware to form a symbolic system composed of that hardware and some program.


Type决定了sequence of bits的含义。

wikipedia Memory address # Contents of each memory location

See also: binary data

Each memory location in a stored-program computer holds a binary number or decimal number of some sort. Its interpretation, as data of some data type or as an instruction, and use are determined by the instructions which retrieve and manipulate it.

NOTE: 上面这段话,对interpretation model的总结是非常好的。

chinaunix 一直有个疑问是关于数据类型的


NOTE: 编译器根据对于不同的数据类型选择不同的机器指令;







2、再比如unsigned short变量,编译器也会在数据读取的时候安排一个0扩展的半字load;



Aliasing: 按照指定类型来进行解释

从底层来看type conversion



static uint64_t load64_le(uint8_t const* V)
#if !defined(__LITTLE_ENDIAN__)
#error This code only works with little endian systems
  uint64_t Ret = *((uint64_t const*)V);
  return Ret;
uint64_t Ret = *((uint64_t const*)V);的执行过程到底是什么?

(uint64_t const*)V会创建一个uint64_t const*临时变量,这个临时变量的值取自变量V*((uint64_t const*)V)从这个临时变量所指向的内存空间中读取值,然后保存到变量Ret中。

cppreference cast operator 章节中有如下内容:

Any pointer to object can be cast to any other pointer to object. If the value is not correctly aligned for the target type, the behavior is undefined. Otherwise(表示符合alignment requirement of the target type), if the value is converted back to the original type, it compares equal to the original value. If a pointer to object is cast to pointer to any character type, the result points at the lowest byte of the object and may be incremented up to sizeof the target type (in other words, can be used to examine object representation or to make a copy via memcpy or memmove).

前面这段话的意思结合上述代码的实例来看的话是这样的:将uint8_t const* V的变量V cast 到uint64_t const*类型,此时会创建一个uint64_t const*临时变量,这个临时变量的值就是V的值,即某个指向uint8_t const类型变量的地址;这种操作是存在风险的:V的类型是uint8_t const*,所以它的值是符合uint8_t类型的alignment requirement的;但是它并一定符合uint64_t const的alignment requirement。一旦不符合,则这就是一个undefined behavior;

指针type conversion

1、geeksforgeeks Little and Big Endian Mystery

这篇文章的这段代码非常好地展示处理type conversion的底层

#include <stdio.h> 
int main() 
    unsigned char arr[2] = {0x01, 0x00}; 
    unsigned short int x = *(unsigned short int *) arr; 
    printf("%d", x); 
    return 0; 

2、Output from arbitrary dereferenced pointer

char buf[8] = { 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88};
char *c_ptr;
unsigned long *u_ptr;

c_ptr = buf;
for (int i=0;i<5;i++)
    u_ptr = (unsigned long *)c_ptr;


c-jump Data Types and Memory Allocation

csdn 汇编语言数据类型以及数据定义详解



关于alias to an existing object,在C++\Language-reference\Basic-concept\Data-model\Object\Object.md中进行了说明,本段只是进行总结。

C++支持两种方式来实现alias to an existing object:

  • reference
  • pointer


  • rereinterpret_cast: C++\Language-reference\Basic-concept\Type-system\Type-conversion\Cast-operator
  • reference: C++\Language-reference\Reference


interpretation model主要描述的是compiler的行为。