basysKom Application Development Services

Protobuf for IoT: Sending cost-optimized into the cloud (Part 2 of 4)
Essential Summary
Protobuf is a great choice to transfer data from a IoT device into the cloud. This blog will show you what Protobuf is, how it can be used with C++ and Javascript and how to handle updates without breaking backwards compatibility.

Introduction

IoT devices for industrial use are often deployed in places without an ethernet or WiFi connection and have to use a mobile broadband connection. To reduce traffic costs, it is required to keep the amount of data that has to be transmitted as low as possible.

Cloud providers also calculate in data blocks of a certain size for billing, so a small message size combined with batching can save money for the cloud side too.

Protobuf

ProtobufProtobuf is a serialization method for structured data developed by Google which offers a compact binary wire format and built-in mechanisms which make it easy to remain forward and backward compatible when updating the messages. It uses an interface description language to specify the structured data and the Protobuf compiler to generate code for serialization, deserialization and message handling for C++, Java, Python, Objective-C, C#, JavaScript, Ruby, Go, PHP and Dart. The compact wire format and the compatibility related features make Protobuf very interesting for the usage in IoT systems. The wide range of supported languages enables the creation of a heterogeneous processing infrastructure for Protobuf encoded device messages.

Using Protobuf in an IoT example

To show the workflow with Protobuf, let‘s define the Protobuf messages for an environment data sensor device in the current proto3 format. The device measures temperature, pressure, humidity and CO2 level and generates events if certain threshold values are exceeded or if a sensor fails.

An important key to understanding why the messages have been designed like this is that there is no way to recognize the message type from the serialized data.  We don’t want to send an identifier in addition to the serialized data, so there must be a single point of entry for all message types we are going to send.

syntax = "proto3";

package iotexample;

message EnvironmentData {
  double temperature = 1;
  double pressure = 2;
  double humidity = 3;
  double co2_level = 4;
}

enum ErrorLevel {
  UNSPECIFIED = 0;
  ERROR = 1;
  WARNING = 2;
  INFO = 3;
}

message Event {
  int32 event_number = 1;
  ErrorLevel error_level = 2;
  string message = 3;
}

message TelemetryMessage {
  uint64 timestamp = 1;
  oneof payload {
    EnvironmentData environment_data = 2;
    Event event = 3;
  }
}

message DeviceMessages {
  repeated TelemetryMessage telemetry_messages = 1;
} 

The first line defines that the following description is in the “proto3” syntax which is the most current version of the Protobuf interface description language.

The second line states that the messages defined below belong to the package “iotexample”. This is required to resolve messages imported from other files and will be used to create namespaces in the generated code.

The keyword message in the following line introduces a new message called EnvironmentData which has four data fields for storing environment sensor values. Each of the data fields has a type, a name and a field number. The name and the field number must be unique inside a message.

In proto3, all fields of a message are optional. If a field is not present in a serialized message, the value will be set to a default value by the receiver except for fields with another message as type. Only the field number and the value are serialized, the mapping from field numbers to names is done by the receiver. This means that the length of the field names has no impact on the size of the serialized message.

The enum keyword in the next block defines an enumeration named ErrorLevel with four values. A proto3 enum must have 0 as constant for the first value due to the default initialization behavior explained before.

The following message is named Event and stores data to describe an event that has occurred on our IoT device. It uses the ErrorLevel enum as type for the second field to encode the severity of the event.

The message TelemetryMessage is the first step to having a single point of entry for all messages. It combines the two messages we have defined before with a timestamp field. The keyword oneof makes sure that only one of the fields enclosed in the curly braces has a value at any time. This is enforced by the generated message handling code which clears any previous set field if a member of the oneof is set.

The DeviceMessages message uses the repeated keyword to define an array of TelemetryMessage values.

To sum it all up, we now have the message DeviceMessages as our point of entry which can be used to create a batch of TelemetryMessage values which each contain a timestamp and an Event or EnvironmentData value.

Important hint for field numbers

The Protobuf encoding requires a different amount of bytes to encode field numbers depending on the number. The field numbers 1-15 can be encoded using one byte, the following numbers up to 2047 require two bytes. If your message contains fields that are not used too often, you should consider assigning them a field number above 15 to have some one byte field numbers available for any extensions with new fields that will be used frequently.

Working with generated code

The previously defined messages are saved to a file which is then used to generate code with protoc.

C++

C++ is our language of choice for the software on the IoT device due to low resource requirements, low level access to the underlying operating system and easy integration of the Azure IoT SDK. The following paragraph will show how to use the C++ code generated by protoc. The generated code makes use of the Protobuf library, so a C++ application using code generated by protoc must be linked against libprotobuf. Protoc is invoked as follows and generates the two files environment_iot_messages.pb.cc and environment_iot_messages.pb.h.
protoc --cpp_out=generated environment_iot_messages.proto 
These files are included into a C++ application that shows how to instantiate and populate all messages, serialization and deserialization as well as dealing with oneofs.
#include <generated/environment_iot_messages.pb.h>

#include <QByteArray>
#include <QDateTime>
#include <QDebug>

int main()
{
    auto environmentData = new iotexample::EnvironmentData();
    environmentData->set_humidity(75);
    environmentData->set_pressure(1080);
    environmentData->set_temperature(23);
    environmentData->set_co2_level(415);

    auto eventData = new iotexample::Event();
    eventData->set_event_number(123);
    eventData->set_message(std::string("My event message"));
    eventData->set_error_level(iotexample::ErrorLevel::ERROR);

    iotexample::DeviceMessages message;
    auto currentMessage = message.add_telemetry_messages();
    currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
    currentMessage->set_allocated_environment_data(environmentData);

    currentMessage = message.add_telemetry_messages();
    currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
    currentMessage->set_allocated_event(eventData);

    qDebug() << "Serialized size:" << message.ByteSizeLong() << "bytes";

    QByteArray serializedMessage(message.ByteSizeLong(), Qt::Initialization::Uninitialized);
    auto success = message.SerializeToArray(serializedMessage.data(), message.ByteSizeLong());

    if (!success)
        return 1;

    iotexample::DeviceMessages deserializedMessage;
    success = deserializedMessage.ParseFromArray(serializedMessage.constData(), serializedMessage.length());

    if (!success)
        return 2;

    qDebug() << "Decoded message with " << deserializedMessage.telemetry_messages_size() << "values";

    for (int i = 0; i < deserializedMessage.telemetry_messages_size(); ++i) {
        const auto &current = deserializedMessage.telemetry_messages(i);
        qDebug() << "Message" << i + 1;
        qDebug() << "  Timestamp:" << QDateTime::fromMSecsSinceEpoch(current.timestamp()).toString(Qt::ISODate);
        if (current.has_event()) {
            qDebug() << "  Event number:" << current.event().event_number();
            qDebug() << "  Event error level:" << current.event().error_level();
            qDebug() << "  Event message:" << QString::fromStdString(current.event().message());
        } else if (current.has_environment_data()) {
            qDebug() << "  Temperature:" << current.environment_data().temperature();
            qDebug() << "  Pressure:" << current.environment_data().pressure();
            qDebug() << "  Humidity:" << current.environment_data().humidity();
            qDebug() << "  CO2:" << current.environment_data().co2_level();
        } else {
            qDebug() << "  Empty TelemetryMessages";
        }
    }


    return 0;
} 
The has_event() and has_environment_data() functions have been added by the protobuf compiler because these two fields have another message as type. The output of this application shows a serialized data size of 80 bytes, all fields are deserialized as expected.
Serialized size: 80 bytes
Decoded message with  2 values
Message 1
  Timestamp: "2020-04-28T11:04:21"
  Temperature: 23
  Pressure: 1080
  Humidity: 75
  CO2: 415
Message 2
  Timestamp: "2020-04-28T11:04:21"
  Event number: 123
  Event error level: 1
  Event message: "My event message" 
For an application running on the IoT device, the use of Protobuf would be finished after the call to SerializeToArray() and the serialized data created by this method call would be passed to the component that sends the data into the cloud.

JavaScript / Node.js

Node.js is well supported on different cloud platforms which makes it a good choice for handling the incoming data from our IoT device, for example in an Azure Function App triggered by an IoT Hub. The file environment_iot_messages_pb.js is generated by invoking protoc.
protoc --js_out=import_style=commonjs,binary:generated environment_iot_messages.proto 
The new file is included into a Node.js application. This application requires the google-protobuf module and performs the same steps as the C++ example from the previous section.
const messages = require('./generated/environment_iot_messages_pb.js')

let environmentData = new proto.iotexample.EnvironmentData();
environmentData.setHumidity(75);
environmentData.setPressure(1080);
environmentData.setTemperature(23);
environmentData.setCo2Level(415);

let eventData = new proto.iotexample.Event();
eventData.setEventNumber(123);
eventData.setErrorLevel(proto.iotexample.ErrorLevel.ERROR);
eventData.setMessage('My event message');

let message = new proto.iotexample.DeviceMessages();

let telemetry = new proto.iotexample.TelemetryMessage();
telemetry.setTimestamp(Date.now());
telemetry.setEnvironmentData(environmentData);
message.addTelemetryMessages(telemetry);

telemetry = new proto.iotexample.TelemetryMessage();
telemetry.setTimestamp(Date.now());
telemetry.setEvent(eventData);
message.addTelemetryMessages(telemetry);

const serializedMessage = message.serializeBinary();

console.log(`Serialized size: ${serializedMessage.length} bytes`);

const deserializedMessage = new proto.iotexample.DeviceMessages.deserializeBinary(serializedMessage);

console.log(`Decoded message with ${deserializedMessage.getTelemetryMessagesList().length} values`);

for (let i = 0; i < deserializedMessage.getTelemetryMessagesList().length; ++i) {
    const current = deserializedMessage.getTelemetryMessagesList()[i];
    console.log(`Message ${i + 1}`);
    console.log(`  Timestamp: ${new Date(current.getTimestamp()).toISOString()}`)
    if (current.hasEvent()) {
        console.log(`  Event number: ${current.getEvent().getEventNumber()}`)
        console.log(`  Event error level: ${current.getEvent().getErrorLevel()}`)
        console.log(`  Event message: ${current.getEvent().getMessage()}`)
    } else if (current.hasEnvironmentData()) {
        console.log(`  Temperature: ${current.getEnvironmentData().getTemperature()}`)
        console.log(`  Pressure: ${current.getEnvironmentData().getPressure()}`)
        console.log(`  Humidity: ${current.getEnvironmentData().getHumidity()}`)
        console.log(`  CO2: ${current.getEnvironmentData().getCo2Level()}`)
    } else {
        console.log('Empty TelemetryMessages')
    }
} 

The output is identical to the C++ example and there is not much difference in the complexity of the message handling.

For a message processor application running in the cloud, the use of Protobuf would start with calling deserializeBinary() on a serialized message that has been received from an IoT Device. The JavaScript generated code also offers a toObject() method for all messages which returns the content of the message as a JavaScript object.
{
  "telemetryMessagesList": [
    {
      "timestamp": 1588065035911,
      "environmentData": {
        "temperature": 23,
        "pressure": 1080,
        "humidity": 75,
        "co2Level": 415
      }
    },
    {
      "timestamp": 1588065035911,
      "event": {
        "eventNumber": 123,
        "errorLevel": 1,
        "message": "My event message"
      }
    }
  ]
} 

Size comparison with JSON encoding

JSON is a widespread format for storing and transmitting structured data in a human readable form.

Let’s encode our example data in JSON to see the difference in data size

[
    {
        "timestamp": 1587988060797,
        "environment_data": {
            "temperature": 23,
            "pressure": 1080,
            "humidity": 75
        }
    },
    {
        "timestamp": 1587988060797,
        "event": {
            "event_number": 123,
            "error_level": 1,
            "message": "My event message"
        }
    }
 ] 

The property names have the highest impact on data size. As the property names must always be included, this is a static additional amount of bytes and choosing shorter names leads to a smaller encoded message size.

The influence of numeric values depends on the number of characters needed to encode that number. This could be better or worse than Protobuf, depending on the value.

If all unnecessary whitespaces are removed, the JSON encoded value is still 211 bytes in size, which is 2.63x the size of the Protobuf encoded data from our example.

Even if the property names are replaced by single letters and only single digit numbers are used, the JSON encoding is still 88 bytes in size compared to the 80 bytes for Protobuf with real values.

[
  {
    "a": 1,
    "b": {
      "a": 1,
      "b": 1,
      "c": 1,
      "d": 1
    }
  },
  {
    "a": 1,
    "c": {
      "a": 1,
      "b": 1,
      "c": "My event message"
    }
  }
] 

Protobuf message updates and how to keep them backward compatible

Sooner or later, it may become necessary to extend one of the messages with additional fields or fields have become obsolete and will no longer be processed by the cloud. This would not be a problem if all devices always had the latest hardware revision with the newest firmware. But in real deployments, this won‘t always be the case and the cloud side must be able to deal with receiving different versions of the Protobuf message.

An important rule when changing Protobuf messages is not to reuse field numbers or names in a message for a different purpose.

As an example, let‘s assume that the next revision of our IoT device hardware replaces the CO2 sensor by an ambient light sensor. If the co2_level field is just renamed to ambientLight and a sender still uses the old message, the sender’s assumption is that it is sending the co2_level value. The receiver which is using the new message assumes the field means ambientLight and writes a wrong value into the database.

To avoid situations like this, ambientLight should become a new field with a number that was not previously used.

message EnvironmentData {
  double temperature = 1;
  double pressure = 2; 
  double humidity = 3;
  double co2_level = 4;
  double ambientLight = 5;
} 
As long as there are active devices with the old hardware and firmware, the co2_level field should be left in the message to keep supporting the old sensor by the cloud side system. The new devices don’t have to set the co2_level field and the receiver will automatically decode it as 0.

Find out if a field has been set

Except for fields with a message as type, Proto3 offers no direct way to find out if a value has been explicitly set or is default initialized. As a workaround, a field can be put in a oneof with exactly one member. The generated code will then contain a function to check if the value has been set.

message EnvironmentData {
  double temperature = 1;
  double pressure = 2; 
  double humidity = 3;
  double co2_level = 4;
  oneof ambient_light_oneof { double ambient_light = 5; }
} 

It is safe to move a single existing field into a new oneof, so in our example, the co2_level field could be moved into a new oneof to allow the receiver to determine if this is an old or a new device

message EnvironmentData {
  double temperature = 1;
  double pressure = 2; 
  double humidity = 3;
  oneof co2_level_oneof { double co2_level = 4; }
  oneof ambient_light_oneof { double ambient_light = 5; }
} 
The generated C++ code contains a function to check which field of the oneof is set.
qDebug() << "  Temperature:" << current.environment_data().temperature();
qDebug() << "  Pressure:" << current.environment_data().pressure();
qDebug() << "  Humidity:" << current.environment_data().humidity();
if (environmentData->co2_level_oneof_case() == iotexample::EnvironmentData::Co2LevelOneofCase::kCo2Level)
    qDebug() << "  CO2:" << current.environment_data().co2_level();
if (environmentData->ambient_light_oneof_case() == iotexample::EnvironmentData::AmbientLightOneofCase::kAmbientLight)
    qDebug() << "  Ambient light:" << current.environment_data().co2_level(); 
The generated JavaScript code has the same function but also adds a hasXYZ() function for each member of the oneof. The following example shows both ways of checking oneof fields.
console.log(`  Temperature: ${current.getEnvironmentData().getTemperature()}`)
console.log(`  Pressure: ${current.getEnvironmentData().getPressure()}`)
console.log(`  Humidity: ${current.getEnvironmentData().getHumidity()}`)
if (current.getEnvironmentData().getCo2LevelOneofCase() === proto.iotexample.EnvironmentData.Co2LevelOneofCase.CO2_LEVEL)
    console.log(`  CO2: ${current.getEnvironmentData().getCo2Level()}`)
if (current.getEnvironmentData().hasAmbientLight());
    console.log(`  Ambient light: ${current.getEnvironmentData().getAmbientLight()}`) 

Removing fields

If the co2_level field is to be removed, Protobuf supports the reserved keyword which can be used to prevent names and field numbers from being reused.
message EnvironmentData {
  reserved 4;
  reserved “co2_level”;
  double temperature = 1;
  double pressure = 2; 
  double humidity = 3;
  oneof ambient_light_oneof { double ambient_light = 5; }
} 

If an attempt is made to reuse a field number or name that has been marked as reserved, protoc will refuse to generate code that violates these constraints

environment_iot_messages.proto: Field "newField" uses reserved number 4.
environment_iot_messages.proto:7:12: Field name "co2_level" is reserved. 
If the design rules for compatibility have been observed, the sender using the old message will still send the field number 4, but the receiver will ignore it and not misinterpret it. For the new ambientLight value, the receiver will see the default value 0 or get the information that the value has not been set if the workaround from the previous section has been employed.

Extending the IoT Hub example

In order to demonstrate how to use Protobuf in an IoT application, we will now extend the Qt based example from the previous part of the series.

The full example code is hosted on github.

First we have to add new includes for the generated Protobuf code and the Qt random number generator which will provide values to populate the Protobuf messages.

#include <generated/environment_iot_messages.pb.h>
#include <QRandomGenerator> 

Then we replace the slot for the timeout signal of sendMessageTimer with a new slot which sends an EnvironmentData message on every invocation and sometimes an additional Event message.

The messages are serialized and passed to the sendMessage() method of the IotHubClient object.

// Initial values for the simulated data
double humidity = 80;
double pressure = 1000;
double temperature = 23;
double co2Level = 410;

QObject::connect(&sendMessageTimer, &QTimer::timeout, &client, [&]() {
    qDebug() << "Message send timer triggered" << QDateTime::currentDateTime().toString(Qt::ISODate);
    auto environmentData = new iotexample::EnvironmentData();
    environmentData->set_humidity(nextRandomValue(humidity, 50, 100, 1));
    environmentData->set_pressure(nextRandomValue(pressure, 900, 1100, 1));
    environmentData->set_temperature(nextRandomValue(temperature, 10, 75, 1));
    environmentData->set_co2_level(nextRandomValue(co2Level,300, 500, 1));

    iotexample::DeviceMessages message;
    auto currentMessage = message.add_telemetry_messages();
    currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
    currentMessage->set_allocated_environment_data(environmentData);

    // 1% chance of generating an error
    if (QRandomGenerator::global()->generateDouble() < 0.01) {
        qDebug() << "Generate error message";
        auto event = new iotexample::Event();
        event->set_message("Simulated event");
        event->set_error_level(static_cast<iotexample::ErrorLevel>(QRandomGenerator::global()->bounded(1, 4)));
        event->set_event_number(QRandomGenerator::global()->bounded(1, 10));

        currentMessage = message.add_telemetry_messages();
        currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
        currentMessage->set_allocated_event(event);
    }

    qDebug() << "Serialized size:" << message.ByteSizeLong() << "bytes";

    QByteArray serializedMessage(message.ByteSizeLong(), Qt::Initialization::Uninitialized);
    auto success = message.SerializeToArray(serializedMessage.data(), message.ByteSizeLong());

    if (success) {
        client.sendMessage(++messageId, serializedMessage);
    } else {
        qWarning() << "Message serialization failed";
    }
}); 
The nextRandomValue() function called in the new slot is a simple helper function with generates a slowly changing random value.
double nextRandomValue(double current, double min, double max, double maxIncrement) {
    if (current < min || current > max)
        current = (min + max) / 2;

    const double factor = QRandomGenerator::global()->generateDouble() < 0.5 ? 1 : -1;
    const double increment = maxIncrement * QRandomGenerator::global()->generateDouble() * factor;

    if (current + increment > max || current + increment < min)
        return current - increment;

    return current + increment;
} 
The final step is to extend the .pro file for generating and building the generated Protobuf code and linking to the Protobuf library
HEADERS += \
        generated/environment_iot_messages.pb.h

SOURCES += \
        generated/environment_iot_messages.pb.cc

LIBS += -lprotobuf

PROTO_COMPILER = protoc
system($$PROTO_COMPILER --version) {
  PROTOFILES = $$files(*.proto, true)
  for (PROTO_FILE, PROTOFILES) {
    command = $$PROTO_COMPILER --cpp_out=generated $$PROTO_FILE
    system($$command)
  }
} else {
    error("protoc is required to build this application")
} 

Conclusion

Designing and updating Protobuf messages is not very complicated if a few important rules regarding field numbers are observed. The generated handling code uses the same concepts in different programming languages and is easy to use after reading a short introduction on how the different features of Protobuf messages are mapped to code.

The serialized Protobuf data is quite small compared to JSON data which makes it suitable for applications like IoT devices with a mobile broadband connection where small messages are favored for monetary reasons. If all messages have a common point of entry, the raw serialized data can be sent without adding any additional meta information.

A combination of C++ on the IoT device and JavaScript on the receiving side could be used to get data from the IoT device into the cloud.

Picture of Jannis Völker

Jannis Völker

Jannis Völker is a software engineer at basysKom GmbH in Darmstadt. After joining basysKom in 2017, he has been working in connectivity projects for embedded devices, Azure based cloud projects and has made contributions to Qt OPC UA and open62541. He has a background in embedded Linux, Qt and OPC UA and holds a master's degree in computer science from the University of Applied Sciences in Darmstadt.

Leave a Reply

Your email address will not be published. Required fields are marked *

More Blogarticles

basysKom Newsletter

We collect only the data you enter in this form (no IP address or information that can be derived from it). The collected data is only used in order to send you our regular newsletters, from which you can unsubscribe at any point using the link at the bottom of each newsletter. We will retain this information until you ask us to delete it permanently. For more information about our privacy policy, read Privacy Policy