Skip to content

Serialization and deserialization

无论使用哪种programming language,都可能会涉及到serialization/deserialization问题,本文对此进行总结。

Encode and decode

这是serialization and deserialization的另外一种说法,在下面文章中对这种说法进行了解释:

medium Use Binary Encoding Instead of JSON:

when you want to send the data over a network or store it in a file, you need to encode the data as a self-contained sequence of bytes. The translation from the in-memory representation to a byte sequence is called encoding and the inverse is called decoding.

显然:

encode=serialization

decode=deserialization

wikipedia Serialization

In computer science, in the context of data storage, serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

This process of serializing an object is also called marshalling an object. The opposite operation, extracting a data structure from a series of bytes, is deserialization (also called unmarshalling).

Uses

For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture-independent format means preventing the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages.

Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity(线性) is an asset(优点), because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.

Even on a single machine, primitive pointer objects are too fragile to save because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling, where direct pointer references are converted to references based on name or position. The deserialization process includes an inverse step called pointer swizzling.

Since both serializing and deserializing can be driven from common code (for example, the Serialize function in Microsoft Foundation Classes), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly. The technique is called differential execution. This is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.

Implementation

Python pickle

比如pickle — Python object serialization

Protocol buffer

https://github.com/protocolbuffers/protobuf

https://developers.google.com/protocol-buffers/

在工程Linux-OS的Network\Protocol\Application-protocol\Protocol-data-format\IDL\Protobuf章节中总结了protocol buffer。

Alignment

对于C++、C中,type有着alignment requirement,在进行deserialization的时候,就需要考虑alignment,这就是strict。