Skip to content

wikipedia Name mangling

In compiler construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.

NOTE: name mangling是由compiler来实现的。其实相比于mangling,我觉得decoration是更加贴合它的实际用途的:及可能地encode更多的information到name中

It provides a way of encoding additional information in the name of a function, structure, class or another datatype in order to pass more semantic information from the compilers to linkers.

NOTE: 原文的第三、第四段告诉我们,进行name mangling的一个重要需求是compiler将尽可能的信息encode到name中,这些信息可能包括return type、namespace、type of parameter,这样linker就可以通过name来获取更多的信息。

This programming pattern of encoding additional information in names is often referred to as "primitive obsession" and is considered an anti-pattern, but in this case has been established as the norm.

NOTE: "primitive obsession" 的意思是“基本类型偏执”,最后一段话告诉我们:name mangling已经被确立为标准(norm)了

Examples

C

int _cdecl    f (int x) { return 0; }
int _stdcall  g (int y) { return 0; }
int _fastcall h (int z) { return 0; }

32 bit compilers emit, respectively:

_f
_g@4
@h@4

NOTE: c compiler将 calling conventions encode到name中

C++

The C++ language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as classes, templates, namespaces, and operator overloading, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated(消除) by mangling (decorating) the name of a symbol. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.

NOTE: 没有标准的弊端就是通用性的降低

Simple example

int  f (void) { return 1; }
int  f (int)  { return 0; }
void g (void) { int i = f(), j = f(0); }
int  __f_v (void) { return 1; }
int  __f_i (int)  { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }

Complex example

namespace wikipedia 
{
   class article 
   {
   public:
      std::string format (void); 
         /* = _ZN9wikipedia7article6formatEv */

      bool print_to (std::ostream&); 
         /* = _ZN9wikipedia7article8print_toERSo */

      class wikilink 
      {
      public:
         wikilink (std::string const& name);
            /* = _ZN9wikipedia7article8wikilinkC1ERKSs */
      };
   };
}

All mangled symbols begin with _Z (note that an identifier beginning with an underscore followed by a capital is a reserved identifier in C, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes

_ZN9Wikipedia7article6formatE

For functions, this is then followed by the type information; as format() is a void function, this is simply v; hence:

_ZN9Wikipedia7article6formatEv

For print_to, the standard type std::ostream (which is a typedef for std::basic_ostream<char, std::char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:

_ZN9Wikipedia7article8print_toERSo

How different compilers mangle the same functions

Compiler void h(int) void h(int, char) void h(void)
Intel C++ 8.0 for Linux _Z1hi _Z1hic _Z1hv
HP aC++ A.05.55 IA-64
IAR EWARM C++ 5.4 ARM
GCC 3.x and higher
Clang 1.x and higher[1]
IAR EWARM C++ 7.4 ARM _Z<number>hi _Z<number>hic _Z<number>hv
GCC 2.9*x* h__Fi h__Fic h__Fv
HP aC++ A.03.45 PA-RISC
Microsoft Visual C++ v6-v10 (mangling details) ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
Digital Mars C++
Borland C++ v3.1 @h$qi @h$qizc @h$qv
OpenVMS C++ V6.5 (ARM mode) H__XI H__XIC H__XV
OpenVMS C++ V6.5 (ANSI mode) CXX$__7H__FIC26CDH77 CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 CXX$_Z1HI2DSQ26A CXX$_Z1HIC2NP3LI4 CXX$_Z1HV0BCA19V
SunPro CC __1cBh6Fi_v_ __1cBh6Fic_v_ __1cBh6F_v_
Tru64 C++ V6.5 (ARM mode) h__Xi h__Xic h__Xv
Tru64 C++ V6.5 (ANSI mode) __7h__Fi __7h__Fic __7h__Fv
Watcom C++ 10.6 W?h$n(i)v W?h$n(ia)v W?h$n()v

Handling of C symbols when linking from C++

The job of the common C++ idiom:

#ifdef __cplusplus 
extern "C" {
#endif
    /* ... */
#ifdef __cplusplus
}
#endif

is to ensure that the symbols within are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.

For example, the standard strings library, <string.h> usually contains something resembling:

#ifdef __cplusplus
extern "C" {
#endif

void *memset (void *, int, size_t);
char *strcat (char *, const char *);
int   strcmp (const char *, const char *);
char *strcpy (char *, const char *);

#ifdef __cplusplus
}
#endif

Thus, code such as:

if (strcmp(argv[1], "-x") == 0) 
    strcpy(a, argv[2]);
else 
    memset (a, 0, sizeof(a));

uses the correct, unmangled strcmp and memset. If the extern had not been used, the (SunPro) C++ compiler would produce code equivalent to:

if (__1cGstrcmp6Fpkc1_i_(argv[1], "-x") == 0) 
    __1cGstrcpy6Fpcpkc_0_(a, argv[2]);
else 
    __1cGmemset6FpviI_0_ (a, 0, sizeof(a));

Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.

Standardised name mangling in C++

Though it would seem that standardised name mangling in the C++ language would lead to greater interoperability between compiler implementations, such a standardization by itself would not suffice to guarantee C++ compiler interoperability and it might even create a false impression that interoperability is possible and safe when it isn't. Name mangling is only one of several application binary interface (ABI) details that need to be decided and observed by a C++ implementation. Other ABI aspects like exception handling, virtual table layout, structure and stack frame padding, etc. also cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g., length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.

The C++ standard therefore does not attempt to standardise name mangling. On the contrary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI, such as exception handling and virtual table layout, are incompatible.

Nevertheless, as detailed in the section above, on some platforms[4] the full C++ ABI has been standardized, including name mangling.

Real-world effects of C++ name mangling

Because C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g., GNU GCC and the OS vendor's compiler) wished to install the Boost C++ Libraries, it would have to be compiled multiple times (once for GCC and once for the vendor compiler).

It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g., classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure(模糊的,隐晦的) bugs and serious stability issues).

For this reason name decoration is an important aspect of any C++-related ABI.

Demangle via c++filt

$ c++filt _ZNK3MapI10StringName3RefI8GDScriptE10ComparatorIS0_E16DefaultAllocatorE3hasERKS0_
Map<StringName, Ref<GDScript>, Comparator<StringName>, DefaultAllocator>::has(StringName const&) const

Demangle via builtin GCC ABI

#include <stdio.h>
#include <stdlib.h>
#include <cxxabi.h>

int main() {
    const char *mangled_name = "_ZNK3MapI10StringName3RefI8GDScriptE10ComparatorIS0_E16DefaultAllocatorE3hasERKS0_";
    char *demangled_name;
    int status = -1;
    demangled_name = abi::__cxa_demangle(mangled_name, NULL, NULL, &status);
    printf("Demangled: %s\n", demangled_name);
    free(demangled_name);
    return 0;
}

Output:

Demangled: Map<StringName, Ref<GDScript>, Comparator<StringName>, DefaultAllocator>::has(StringName const&) const

Python

In Python, mangling is used for "private" class members which are designated as such by giving them a name with two leading underscores and no more than one trailing underscore. For example, __thing will be mangled, as will ___thing and __thing_, but __thing__ and __thing___ will not. Python's runtime does not restrict access to such members, the mangling only prevents name collisions if a derived class defines a member with the same name.

On encountering name mangled attributes, Python transforms these names by prepending a single underscore and the name of the enclosing class, for example:

>>> class Test(object):
...     def __mangled_name(self):
...         pass
...     def normal_name(self):
...         pass
>>> t = Test()
>>> [attr for attr in dir(t) if 'name' in attr]
['_Test__mangled_name', 'normal_name']

geeksforgeeks Name Mangling and extern “C” in C++

NOTE: 其中给出了具体的例子