Streaming Protobuf Puns Are COBS'olete

Sigh… Yeah, I know… COBS’olete…

Anyways, last week I wrote a post on streaming Protocol Buffers (to a file, in memory, wherever). While writing that post, it got me thinking of alternative ways to handle delimiting data to a file.

Coincidentally, an Android/USB project that’s been in the background for a couple of months re-surfaced, and my recommendation on using Consistent Overhead Byte Stuffing (COBS) for proprietary packet framing was implemented on the hardware side.

I’m going to skip the explanation on what COBS is because it’s already quite cleanly and concisely explained in that Wiki article.

I’d used COBS a few years back when it was recommended by a colleague while working on a streaming Bluetooth Classic packet framing protocol. I really liked it, it was fast and simple with minimal overhead… All-in-all, what you want in a framing protocol (note, JUST framing, not error handling or correction).

I spent about 30 minutes looking for a Java implementation of the C version that’s located in Wikipedia, then I realized it would probably take the same amount of time to just write it myself anyways.

Basic implementation

I wrote this basic encode/decode implementation. It takes in a byte[] packet to encode (replaces all 0’s with non-zeros - and appends a trailing 0), and a byte[] packet to decode (assumes the packet has a trailing 0).

As always, this code can be found in my GitHub repo:

public class CobsUtils {

    // Expected to be the entire packet to encode
    public static byte[] encode(byte[] packet) {
        if (packet == null
                || packet.length == 0) {
            return new byte[]{};
        }

        byte[] output = new byte[packet.length + 2];
        byte blockStartValue = 1;
        int lastZeroIndex = 0;
        int srcIndex = 0;
        int destIndex = 1;

        while (srcIndex < packet.length) {
            if (packet[srcIndex] == 0) {
                output[lastZeroIndex] = blockStartValue;
                lastZeroIndex = destIndex++;
                blockStartValue = 1;
            } else {
                output[destIndex++] = packet[srcIndex];
                if (++blockStartValue == 255) {
                    output[lastZeroIndex] = blockStartValue;
                    lastZeroIndex = destIndex++;
                    blockStartValue = 1;
                }
            }

            ++srcIndex;
        }

        output[lastZeroIndex] = blockStartValue;
        return output;
    }

    // Expected to be the entire packet to decode with trailing 0
    public static byte[] decode(byte[] packet) {
        if (packet == null
                || packet.length == 0
                || packet[packet.length - 1] != 0) {
            return new byte[]{};
        }

        byte[] output = new byte[packet.length - 2];
        int srcPacketLength = packet.length - 1;
        int srcIndex = 0;
        int destIndex = 0;

        while (srcIndex < srcPacketLength) {
            int code = packet[srcIndex++] & 0xff;
            for (int i = 1; srcIndex < srcPacketLength && i < code; ++i) {
                output[destIndex++] = packet[srcIndex++];
            }
            if (code != 255 && srcIndex != srcPacketLength) {
                output[destIndex++] = 0;
            }
        }

        return output;
    }

}

Possible Extensions

Some possible (and simple) extensions to these methods would involve having them operating on streams instead of just byte arrays, and maybe I’ll add those at some point. They’re only a few lines of code, I just need to think through the error handling a bit (end of stream, invalid packeting issues midway through a stream).

Using Okio’s buffers, it would just be a matter of hooking a buffered source into a buffered sink, and sticking a streaming encode/decode in the middle.

Pseudo-Limitations

As I mentioned earlier, COBS is just a framing protocol. It does not attempt to determine the ‘quality’ of the transferred information. For that, you’d need some other information like a checksum, and maybe encode the length of the data (in case you’re encoding multiple packets inside of one frame). Sky’s the limit!

A simple explanation on how to overcome these ‘limitations’ of COBS is mentioned in this StackOverflow post:

I would advocate for a byte-stuffing method combined with a length byte (packet length modulo 256) and a packet-level CRC, and then use UART with a parity bit. The length byte ensures dropped-byte detection, which works well with the parity bit (because most UARTs will drop any bytes that fail parity). Then the packet-level CRC gives you extra security.

As for the overhead of byte-stuffing, have you looked at the COBS protocol? It’s a genius way to do byte-stuffing with a fixed overhead of 1 byte per every 254 transmitted (plus your framing, CRC, LEN, etc).

Other Considerations

The question I asked myself at the end of this little exercise was why I would still use length-delimited Protocol Buffers, when I could just use COBS to frame an arbitrarily long Protobuf message.

Overhead

Length-delimiting requires a fixed-endian, fixed-size delimiter in front of each packet. COBS requires a 0 at the end of each packet. Crunching the numbers, COBS actually requires more overhead than a 2-byte length-delimiter (e.g. 65536 bytes, versus needing a delimiter plus an overhead byte for each COBS packet). COBS requires less overhead than a 4-byte length delimiter for short packets (2 vs 4).

Skipping

One reason I like the length delimiter is that I like the concept of being able to skip X bytes while reading through a stream. In cases where X is only 20-30 bytes, it’s mostly irrelevant, but if you’re streaming through a 10 meg file, that might become noticeable.

Pre-Allocation

With COBS, it’s necessary to keep reading in data until a 0 delimiter is found, whereas in length delimiters, the packet size is known ahead of time - so arrays can be allocated accordingly (e.g. what if the first 0 was actually 50 megs into the data for some reason? Could be a big out-of-memory exception).

Data Integrity

A huge con of length delimiting is that if you ever run into a situation where you might lose a byte along the way, you’re completely lost. There is no way to recover from missing data (which is why I recommend only using it in memory and file streams - which are pretty robust).

Overall

So, in summary, I don’t think COBS streaming Protocol Buffers is better than streaming length delimited Protobufs for many reasons, however, I still do like COBS for framing packets over the wire - in situations where we might encounter data loss, or show up in the middle of a packet (UART, USB, Bluetooth Classic).

Added to that, COBS alongside a length and CRC is a pretty robust setup for framing AND error checking - and probably what I’ll be using from now on in my communication protocols.

Development Streaming Protobuf Puns Are COBS'olete