LB

Serialization in Computer Science

Published at: 2023-01-21T12:00+03:00

Introduction


Serialization is a term in computer science that refers to a process of translating a data structure or object information into a format that can be stored in a device or transmitted over a network and possibly reconstructed later either in the same device or in another device without the losing information of the original data structure. When the serialized data is reconstructed it is used to create a semantically identical copy of the original data structure. The conversion of data structure or object to a serialized format is called serialization and the reconstruction method is called deserialization. Some of the importance of serialization are:

  1. Transferring data over a network or through a file.

  2. Persisting an object on a disk.

  3. Distributing objects, especially in component-based software engineering such as COM, CORBA, etc., and many more.

  4. Maintaining security or user-specific information across applications

The Need for Serialization

Whenever we create variables, or data structures that are composed of primitive data types in any programming language, the programming language will represent that information somewhere in the memory of the machine using collections of bits and bytes. The programming language will use these bits and bytes to represent the information and to perform the logic the code represents. The problem arises when we want to transfer this information to other places from the memory of the current machine keeping the current state of the information intact. How can we do that?

Because whatever the platform the other side is, it doesn't know about the memory addresses of where the data are stored and the hardware architecture of the machine. And if we terminate the process of the application running, all information from the memory will be removed, if we don't manage to store them. We should also keep in mind that architectural independence and language independence should also be maintained. Meaning objects created in Ubuntu dell x86 bit computers can be readable by windows hp 64 or x86 bits machine, and vice versa. And also objects in python should be readable by java, NodeJs, C#, etc., and vice versa. Maintaining architectural and language independence means preventing the problem of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages.

What Are Common Languages for Data Serialization?

Most of the time a developer doesn't have to worry about serializing a data structure through coding the functionality from scratch as a number of popular programming languages provide either native support or have libraries that add non-native capabilities for serialization to their feature set. Java, .NET, C++, Ruby, Node.js, Python, and Go, for example, all have either native serialization support or integrate with libraries for serialization.

Serialization formats

Formats are the way the data is represented before it is stored or transmitted. It's like a standardized(sometimes not) way of representing, storing, understanding, and transmitting information by most parties concerned. Formats answer the question of type consistency, compatibility, and performance. There are a number of serialization formats that are used in different programming languages. Some of the most popular serialization formats are:

  1. JSON

  2. XML

  3. YAML

  4. Protocol Buffers (protobuf)and many more.

you can read more about them here, But each of these formats have their own pros and cons.

JSON: As most of you know JSON is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA 3rd Edition - December 1999. JSON is a text format that is completely language-independent but uses conventions that are familiar to programmers of the C family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

XML: is a markup language much like HTML. The main difference is that XML was designed to transport and store data, rather than to display data. XML was designed to be both human- and machine-readable. The World Wide Web Consortium (W3C) developed XML to meet the growing need for data sharing on the Web.

YAML: is a strict superset of JSON and includes additional features such as data type tags, support for cyclic data structures, indentation-sensitive syntax, and multiple forms of scalar data quoting.

Protocol Buffers(protobuf): a Google project used to easily and efficiently serialize structured data so that it can be transmitted over a wire or stored in files. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. The schema definition helps not to lot lose the specific data types and their respective memory size of the data when exchanging between programming languages.

Drawbacks of Serialization

  1. Security: Serialization is a security risk. It is possible to create a serialized object that will execute arbitrary code when deserialized. This is called a deserialization vulnerability. This is a common attack vector for remote code execution.

  2. Performance: Serialization is a performance risk. Serialization is a slow process. It is not uncommon for a serialization process to take 10x longer than the original operation. This is especially true for large objects, even though it can differ from serialization format to serialization format. Some formats can be faster than others

  3. Serialization breaks the opacity of an abstract data type by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate encapsulation. To discourage competitors from making compatible products, publishers of proprietary software often keep the details of their programs' serialization formats a trade secret.

Thanks for reading to the end.👏👏👏💪💪.

Get In Touch

© 2024 Lioul Behailu . All rights reserved.