Serialization/Protocol Compilers

in programming •  7 years ago 

This is a list and discussion about a few libraries and other techniques that exist and can be useful. I am interested more on the C++ side for some specific libraries.

Definition

Serialization is the act of storing/loading a collection of structs/classes from memory/file/stream.

Protocol Compilers generates from a language a de/serializer from a given specification written with a language. Protocol Compilers are appealing because you can share this specification with other people, and use it for instance to implement a RPC Server/Client, so you "avoid" to rewrite some code.

Why it matters ?

Well, at some point you want to communicate from/with the outside world, your constraints can be different from a performance, storage, ease of use/maintain/readability, but it is the same problem you need to de/serialize some data and if someone has solved the issue, it is nice and you can focus on more important things. But of course, you need to know what are the constraints and the cost that come with such a solution.

The List

Language Agnostic (with many backends):

  • XML Style
  • JSON Family
  • Google Protobuf
  • Apache Thrift
  • FlatBuffers
  • Cap'n Proto

C++ Family:

  • Boost Serialization
  • Cereal
  • YAS
  • Manual Style

XML Style

XML is widely used to share documents and there specifications. There are several standards and tools that exist to handle it.

I will keep it short because I have a strong bias against it. Since it is easy to write specifications of a XML document a lot people writing specifications tend to over design, to over complicate things. This is very bad because you have to handle a lot of cases, a lot of noise for nothing. Handling all this is taking a lot of code, a lot of time and a lot of storage. It is possible to use some tools that can handle the XML specification and generate for several classes to implement the model, but if you want to be lean, fast you will have to use low level parsing techniques.

There are different variants of XML dialects to handle this, but they all suffer from the same size and complexity desease (SOAP, XML-RPC...).

JSON Family

JSON is a schema less way to encode data first for JavaScript, it is nowadays widespread due to its ease of use and its readability for debugging. It is exactly the reverse philosophy of XML. It is easy to parse, to generate. Still it suffers from its size, but not at an XML level.

There are few extensions that exist to reduce this size issue, which matters for performance and storage too.

BSON

BSON is a more compact representation of JSON which is implemented inside the well-known MongoDB. It offers a more dense encoding of JSON.

MessagePack

MessagePack is similar to BSON but was designed to be a replacement of JSON, it also offers a language to describe messages just like Google Protobuf or Apache Thrift.

Google Protobuf

It offers a language to describe the messages and it can be compiled to several target languages (C++,C#,Go,Java,Python,Javascript...). The generated code is fast for de/serialization and the size of message is compact. It does not offer a RPC implementation (check (grpc)[https://grpc.io/docs/guides/]). It is widely used (not only in Google), well documented and well tested.

Apache Thrift

Apache Thrift is similar to Protobuf and it supports a wide list of target languages. If offers also socket helpers to implement easily RPC client/server. It is also pretty complete like Protobuf and used by well known big companies for production use (Facebook, Pinterest, Uber). The generated code is fast for de/serialization and the size of message is compact too.

FlatBuffers

It is a cross plaform serialization library, you have to describe the messages too. It does not have inheritance, which is not an issue there are some workarounds. The performances seem good and it uses a Zero copy mechanism.

Cap'N Proto

It is like Protocol Buffers or Thrift. It has its own language. It seems promising and it has very impressive performances. It has a RPC with interesting features.

Boost Serialization

Now we are moving on the C++ side. I used Boost Serialization a few years ago, and I was not having a major need for performances nor storage size. The library is well tested, robust and versatile. You can use it to add de/serializer in several ways, which is very handy when you want to serialize some classes that do not have access to the source code. The documentation is correct and the performances are correct. It handles well polymorphisms, versioning (which is very useful) and other fancy features.

Cereal

It is a library in the same spirit of Boost Serialization, very familiar. It has support for the std library. However, it is not able to perform the pointer/reference tracking like Boost Serialization is able to do. But, if it is not your problem it is faster than Boost Serialization due to this simplification. It is well tested and documented.

YAS (Yet Another Serialization)

It is another serialization library that is focused on performances, it supports many containers from the std and boost library. It seems to not implement the pointer/reference tracking like Boost Serialization. The usage is similar to Boost Serialization. It is relatively new and missing some documentation. The benchmark numbers are really good.

https://raw.githubusercontent.com/thekvs/cpp-serializers/master/images/time.png

Manual Style

It is relatively easy to use memcpy, or few low level read and write to implement you own serialization. You will have to face endianess issues if you want to use it across platforms, implement different schemes for pointers, and it will start to be a bit hairy when you will have to handle polymorphism. When you do this, you will spend a lot of time and you have to be very careful to minimize the number of read/write you perform, otherwise the performance will vanish if you spend your time calling the kernel. Also, for pure performance point of view you will want to implement this with zero buffer which is more difficult. You can also use memory mapped IO/file as used in MongoDB too (boost library is handy for this).

Sources:

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

This post received a 3.64% upvote from @randowhale thanks to @boucaron! To learn more, check out @randowhale 101 - Everything You Need to Know!

The @OriginalWorks bot has determined this post by @boucaron to be original material and upvoted it!

ezgif.com-resize.gif

To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!

To enter this post into the daily RESTEEM contest, upvote this comment! The user with the most upvotes on their @OriginalWorks comment will win!

For more information, Click Here! || Click here to participate in the @OriginalWorks sponsored writing contest!
Special thanks to @reggaemuffin for being a supporter! Vote him as a witness to help make Steemit a better place!

Calling @originalworks :)
img credz: pixabay.com
Nice, you got a 2.1% @minnowbooster upgoat, thanks to @boucaron
Want a boost? Minnowbooster's got your back!