Development You Stream, I Stream, We All Stream For Protocol Buffers

I’ve surprised the hell out of myself due to the fact that I haven’t yet written about my man-crush on Google’s Protocol Buffers (other than showing how to build a CI server for $5 ).

The summary is that I use Protocol Buffers to compact data into a binary encoding which is backwards and forwards compatible. I would call them a glorified set of binary key-value pairs. So, I get the speed and size benefits of binary code (as opposed to, say, JSON) - plus, I’m not locked into a specific schema (like if I were to store raw structs and arrays).

Google’s own documentation explains best their utility. Another good, brief take on JSON vs PBs is on StackOverflow .

Note: This post isn’t really a primer on using Protocol Buffers (as Google’s docs explain that very well), it’s about extending them to support streaming functionality.

Update : I have a follow-up post on an alternative way to stream Protocol Buffers as well.

Here Are Some Numbers

Just for some fun and anecdotal evidence, I took 10MB of uncompressed JSON (3 fields, 2 int32’s and 1 double) - and that exact data became approximately 2.8MB of uncompressed Protobuffers. It should be noted that Protobuffers, even with their built-in overhead, might still be smaller than the equivalent raw structures. The reason for this is that the Protobuf overhead comes from field tags , but the size reduction comes from using varints which take up only as much space as they need to (e.g. a raw int would be 4 bytes, but a pb-int might only be 1 or 2 bytes if that’s all that was needed).

Actually, looking at my example again, I think I went overkill on the ‘double’ datatype, as that’s a fixed 64-bits, and should have used a ‘float’ to save 32-bits per message!

My Protocol Buffer Use Cases

Anyways, at the end of the day, I’ve been using Protocol Buffers for a few years now - mostly for storing blobs and timeseries data, when a database doesn’t make sense (or, similarly, as an attachment in Couchbase Mobile ). One other nifty use I’ve had for them is communicating between Android and C++.

Instead, I’ll just write my data to a pb file from Android, pull it back in on the C++ side and run my signal processing algorithms (or alternatively, I send an in-memory Protobuffer across the JNI boundary). This way, I can develop my C++ and Android code on their own, and have an interface layer which is robust to changes.

Protobuf (De)Limitations

There is a bit of functionality I’ve thought about, but never really needed - so never bothered writing any code for it. Until now.

The Protobuffer docs mention the use case of streaming multiple messages , but that is not inherently supported because the Protocol Buffer message format is not self-delimiting. I always found this to be kinda stupid, as there are length delimiters INSIDE a Protocol Buffer message (e.g. strings are length-prefixed, embedded messages are length-prefixed, and repeated values are length-prefixed!).

It’s because of these examples that I didn’t really understand why Google didn’t throw in some sort of option for adding a self-delimiter to each message, with the explicit understanding that delimited blobs were not compatible with non-delimited blobs… I kind of get it, but at the same time, I kinda don’t.

In any case, I did it myself.

Streaming Python Messages

I read a great post by Eli Bendersky where he showed a simple example of how to do this delimiting in Python’s implementation of Protocol Buffers.

For context, here is my re-implementation of it (writes messages by appending to a file, reads in all messages from a file):

     def to_hex(s):
        lst = []
        for ch in s:
            hv = hex(ord(ch)).replace('0x', '')
            if len(hv) == 1:
                hv = '0'+hv
            lst.append(hv)

        return reduce(lambda x,y:x+y, lst)


    def pack_message(message):
        s = message.SerializeToString()
        packed_len = struct.pack('<L', len(s))
        return to_hex(packed_len + s)


    def unpack_messages(buffer):
        messages = []
        n = 0
        while n < len(buffer):
            msg_len = struct.unpack('<L', buffer[n:n+4])[0]
            n += 4
            msg_buf = buffer[n:msg_len+n]
            n += msg_len

            message = sample_pb2.Sample()
            message.ParseFromString(msg_buf)
            messages.append(message)

        return messages


    with open('test.pbld', 'wb') as outfile:
        for datum in data:
            outfile.write(pack_message(datum))


    with open('test.pbld', 'rb') as infile:
        buffer = infile.read()
        messages = unpack_messages(buffer)
 

In his original code, Eli writes to a socket - but the brass tacks are here. Note: I can’t recall if the to_hex was really necessary or not. I think I wrote it to make double-checking my work easier.

In summary, all we’re doing is first writing the size of the message in a little-endian encoded, fixed size (e.g. 4-byte int) and then writing the message out, and repeat. The length-delimiter must be a fixed-size, otherwise we wouldn’t know how to parse it, and we explicitly specify the endianness to avoid compatibility issues across operating systems.

Protocol-Buffers

What About Android?

For Protobufs in Android, I use (yet another) Square library. Specifically, Wire . Wire is a method-count friendly implementation of Protocol Buffers, that comes with a code generator too (note, at this time, I haven’t tried Google’s proto3 JavaNano implementation - but Wire uses some of it already).

Unfortunately, all of the generated classes are marked ‘final’, so they’re not extendable… So, instead, I created a few static methods inside a WireUtils class to perform this length-delimited encoding and decoding.

     public static void writeDelimitedTo(OutputStream outputStream, List<Message> messages) throws IOException {
        for (Message message : messages) {
            writeDelimitedTo(outputStream, message);
        }
    }

    public static void writeDelimitedTo(OutputStream outputStream, Message message) throws IOException {
        int size = message.adapter().encodedSize(message);
        BufferedSink sink = Okio.buffer(Okio.sink(outputStream));
        sink.writeIntLe(size);
        message.encode(sink);
        sink.emit();
    }

    public static <M extends Message> List<M> readDelimitedFrom(InputStream inputStream, ProtoAdapter<M> adapter) throws IOException {
        List<M> messages = new ArrayList<>();
        BufferedSource source = Okio.buffer(Okio.source(inputStream));
        while (!source.exhausted()) {
            int size = source.readIntLe();
            byte[] bytes = source.readByteArray(size);
            messages.add(adapter.decode(bytes));
        }
        return messages;
    }
 

Sample App

I’ve written a small sample app on Github which will either add one random value, or add 10 random values, to a file. It will then close the file, and re-open it for reading, and then populate a graph based on the file data.

Streaming-Protobuf-App-Landscape

For (unoptimized) performance metrics using length-delimited encoding with 86400 samples using an incrementing ‘x’ from 0, and random 0-1000 for the ‘y’ - on a Samsung S4 Mini running Android 4.4.2:

 Filesize = 900kB
Build messages and then outputStream to file = 2900ms
Read messages from file inputStream and re-serialize into list of objects = 2700ms

 

What’s interesting is that if you change the length-delimiter from an int to a byte, there isn’t much of an appreciable difference in read/write time - however, there is a significant difference in the size of the file. This is also a function of the sample message, which only contains around 8 bytes, so the 4 byte delimiter is easily 30% overhead. As the size of the messages increases, the length-delimiter overhead decreases, but you still maintain full extensibility.

Here is an example of the same test as above, using a 1-byte length delimiter:

 Filesize = 650kB
Build messages and then outputStream to file = 2800ms
Read messages from file inputStream and re-serialize into list of objects = 2600ms

 

Next Steps - Embedded Systems

One of the next things I want to play around with is Flatbuffers (or Cap’n Proto ).

They can be considered an evolution of Protocol Buffers, but they are not without their downsides - so I would rather call them an alternative to Protobufs. The upsides are that they are significantly faster to use at runtime, and they don’t require as much memory usage - however, they also take up more room on the wire/disk. Also, they are slightly more complicated to work with.

For my most recent use case, size of the file was more important, as I wasn’t time-bound, nor was I very memory-bound (given that it was a move away from JSON).

One application where I like using Protocol Buffers, and I’m excited to try out FlatBuffers or Cap’n Proto is for embedded systems.

Putting some framing around Protobufs is a great way to transfer data over USB, Bluetooth, or serial - since it adds a lot of flexibility to the interface and data model (which is such a pain in the embedded world). The reason I like a zero-allocation protocol for this is because of the very limited resources in most embedded processors (even at the expense of a bit more wire traffic).

Hopefully I’ll be writing about that later this year!

Feature Photo credit: ben_nuttall / Foter / CC BY-SA