Skip to content

Subroutine

"subroutine"就是我们平时所说的“函数”,我们对它习以为常,

wikipedia Subroutine

In computer programming, a subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.

Subprograms may be defined within programs, or separately in libraries that can be used by many programs. In different programming languages, a subroutine may be called a procedure, a function, a routine, a method, or a subprogram. The generic term callable unit is sometimes used.

The name subprogram suggests a subroutine behaves in much the same way as a computer program that is used as one step in a larger program or another subprogram. A subroutine is often coded so that it can be started several times and from several places during one execution of the program, including from other subroutines, and then branch back (return) to the next instruction after the call, once the subroutine's task is done. The idea of a subroutine was initially conceived by John Mauchly during his work on ENIAC, and recorded in a Harvard symposium in January of 1947 entitled 'Preparation of Problems for EDVAC-type Machines'. . Maurice Wilkes, David Wheeler, and Stanley Gill are generally credited with the formal invention of this concept, which they termed a closed subroutine, contrasted with an open subroutine or macro.

NOTE: 今天我们都对subprogram习以为常,但是第一个提出的人,却是经过了万千思考,具有开创性地意义。

Subroutines are a powerful programming tool,[7] and the syntax of many programming languages includes support for writing and using them. Judicious(明智的) use of subroutines (for example, through the structured programming approach) will often substantially(大体上) reduce the cost of developing and maintaining a large program, while increasing its quality and reliability.[8] Subroutines, often collected into libraries, are an important mechanism for sharing and trading software. The discipline of object-oriented programming is based on objects and methods (which are subroutines attached to these objects or object classes).

In the compiling method called threaded code, the executable program is basically a sequence of subroutine calls.

Main concepts

A subroutine may be written so that it expects to obtain one or more data values from the calling program (to replace its parameters or formal parameters). The calling program provides actual values for these parameters, called arguments. Different programming languages may use different conventions for passing arguments:

Convention Description Common use
Call by value Argument is evaluated and copy of value is passed to subroutine; Default in most Algol-like languages after Algol 60, such as Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others. C, C++, Java (References to objects and arrays are also passed by value)
Call by reference Reference to argument, typically its address is passed Selectable in most Algol-like languages after Algol 60, such as Algol 68, Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others. C++, Fortran, PL/I
Call by result Parameter value is copied back to argument on return from the subroutine Ada OUT parameters
Call by value-result Parameter value is copied back on entry to the subroutine and again on return Algol
Call by name Like a macro – replace the parameters with the unevaluated argument expressions;注意,是**unevaluated** ,这和Call by value截然不同。 Algol, Scala
Call by constant value Like call by value except that the parameter is treated as a constant PL/I NONASSIGNABLE parameters, Ada IN parameters

NOTE: 注意,是否evaluate是一个非常重要的区别;

The subroutine may return a computed value to its caller (its return value), or provide various result values or output parameters. Indeed, a common use of subroutines is to implement mathematical functions, in which the purpose of the subroutine is purely to compute one or more results whose values are entirely determined by the arguments passed to the subroutine. (Examples might include computing the logarithm of a number or the determinant of a matrix.)

A subroutine can be coded so that it may call itself recursively, at one or more places, to perform its task. This method allows direct implementation of functions defined by mathematical induction and recursive divide and conquer algorithms.

NOTE: 显然宏替换不存在递归

A subroutine whose purpose is to compute one boolean-valued function (that is, to answer a yes/no question) is sometimes called a predicate. In logic programminglanguages, often[vague] all subroutines are called predicates, since they primarily[vague] determine success or failure.

Language support

High-level programming languages usually include specific constructs to:

  • delimit the part of the program (body) that makes up the subroutine
  • assign an identifier (name) to the subroutine
  • specify the names and data types of its parameters and return values
  • provide a private naming scope for its temporary variables
  • identify variables outside the subroutine that are accessible within it
  • call the subroutine
  • provide values to its parameters
  • the main program contains the address of the subprogram
  • the sub program contains the address of next instruction of the function call in main program
  • specify the return values from within its body
  • return to the calling program
  • dispose of the values returned by a call
  • handle any exceptional conditions encountered during the call
  • package subroutines into a module, library, object, class, etc.

In programming languages such as C, C++, and C#, subroutines may also simply be called functions, not to be confused with mathematical functions or functional programming, which are different concepts.

A language's compiler will usually translate procedure calls and returns into machine instructions according to a well-defined calling convention, so that subroutines can be compiled separately from the programs that call them. The instruction sequences corresponding to call and return statements are called the procedure's prologue and epilogue.

Advantages

The advantages of breaking a program into subroutines include:

  • Decomposing a complex programming task into simpler steps: this is one of the two main tools of structured programming, along with data structures
  • Reducing duplicate code within a program
  • Enabling reuse of code across multiple programs
  • Dividing a large programming task among various programmers, or various stages of a project
  • Hiding implementation details from users of the subroutine
  • Improving readability of code by replacing a block of code with a function call where a descriptive function name serves to describe the block of code. This makes the calling code concise and readable even if the function is not meant to be reused.
  • Improving traceability (i.e. most languages offer ways to obtain the call trace which includes the names of the involved subroutines and perhaps even more information such as file names and line numbers); by not decomposing the code into subroutines, debugging would be severely impaired

Disadvantages

Invoking a subroutine (versus using in-line code) imposes some computational overhead in the call mechanism.

A subroutine typically requires standard housekeeping code – both at entry to, and exit from, the function (function prologue and epilogue – usually saving general purpose registers and return address as a minimum).

History

Call stack

Most modern implementations of a subroutine call use a call stack, a special case of the stack data structure, to implement subroutine calls and returns. Each procedure call creates a new entry, called a stack frame, at the top of the stack; when the procedure returns, its stack frame is deleted from the stack, and its space may be used for other procedure calls. Each stack frame contains the private data of the corresponding call, which typically includes the procedure's parameters and internal variables, and the return address.

The call sequence can be implemented by a sequence of ordinary instructions (an approach still used in reduced instruction set computing (RISC) and very long instruction word (VLIW) architectures), but many traditional machines designed since the late 1960s have included special instructions for that purpose.

The call stack is usually implemented as a contiguous area of memory. It is an arbitrary design choice whether the bottom of the stack is the lowest or highest address within this area, so that the stack may grow forwards or backwards in memory; however, many architectures chose the latter.[citation needed]

NOTE: 关于call stack的增长方向,可以参考Virtual address space中的视图和这篇文章 显然,在process的address space中有专门的stack空间。

Some designs, notably some Forth implementations, used two separate stacks, one mainly for control information (like return addresses and loop counters) and the other for data. The former was, or worked like, a call stack and was only indirectly accessible to the programmer through other language constructs while the latter was more directly accessible.

When stack-based procedure calls were first introduced, an important motivation was to save precious memory.[citation needed] With this scheme, the compiler does not have to reserve separate space in memory for the private data (parameters, return address, and local variables) of each procedure. At any moment, the stack contains only the private data of the calls that are currently active (namely, which have been called but haven't returned yet). Because of the ways in which programs were usually assembled from libraries, it was (and still is) not uncommon to find programs that include thousands of subroutines, of which only a handful are active at any given moment.[citation needed] For such programs, the call stack mechanism could save significant amounts of memory. Indeed, the call stack mechanism can be viewed as the earliest and simplest method for automatic memory management.

However, another advantage of the call stack method is that it allows recursive subroutine calls, since each nested call to the same procedure gets a separate instance of its private data.

NOTE: 最后两段话介绍了stack-based procedure call的优势,作用;关于call stack,还是看其专门介绍。

Local variables, recursion and reentrancy

stack frames

If a subprogram can be executed properly even when another execution of the same subprogram is already in progress, that subprogram is said to be reentrant. A recursive subprogram must be reentrant. Reentrant subprograms are also useful in multi-threaded situations, since multiple threads can call the same subprogram without fear of interfering with each other. In the IBM CICS transaction processing system, quasi-reentrant was a slightly less restrictive, but similar, requirement for application programs that were shared by many threads.

In a multi-threaded environment, there is generally more than one stack. An environment that fully supports coroutines or lazy evaluation may use data structures other than stacks to store their activation records.

Overloading

Optimization of subroutine calls

There is a significant runtime overhead in a calling a subroutine, including passing the arguments, branching to the subprogram, and branching back to the caller. The overhead often includes saving and restoring certain processor registers, allocating and reclaiming call frame storage, etc.. In some languages, each subroutine call also implies automatic testing of the subroutine's return code, or the handling of exceptions that it may raise. In object-oriented languages, a significant source of overhead is the intensively used dynamic dispatch for method calls.

There are some seemingly obvious optimizations of procedure calls that cannot be applied if the procedures may have side effects. For example, in the expression (f(x)-1)/(f(x)+1), the function f must be called twice, because the two calls may return different results. Moreover, the value of x must be fetched again before the second call, since the first call may have changed it. Determining whether a subprogram may have a side effect is very difficult (indeed, undecidable).[citation needed]So, while those optimizations are safe in purely functional programming languages, compilers of typical imperative programming usually have to assume the worst.

Inlining

A method used to eliminate this overhead is inline expansion or inlining of the subprogram's body at each call site (versus branching to the subroutine and back). Not only does this avoid the call overhead, but it also allows the compiler to optimize the procedure's body more effectively by taking into account the context and arguments at that call. The inserted body can be optimized by the compiler. Inlining however, will usually increase the code size, unless the program contains only one call to the subroutine, or the subroutine body is less code than the call overhead.

draft

20201215

本章总结“subroute”即“函数”相关内容。这部分内容已经转移到了到了工程programming-language

subroutine梳理思路

subroutine是thread的执行单位。

subroutine的执行过程可以以树的形式来进行展示,在tree中进行了总结。

subroutine的时间复杂度、空间复杂度的分析,在算法分析中进行总结。

calling convention and call by value、by reference

calling convention

每次调用还是都需要new一个栈帧

每次调用函数,还需要保存上一个函数的context

将subroutine和coroutine进行总结