Introduction
IoT devices for industrial use are often deployed in places without an ethernet or WiFi connection and have to use a mobile broadband connection. To reduce traffic costs, it is required to keep the amount of data that has to be transmitted as low as possible.
Cloud providers also calculate in data blocks of a certain size for billing, so a small message size combined with batching can save money for the cloud side too.
Protobuf
Protobuf is a serialization method for structured data developed by Google which offers a compact binary wire format and built-in mechanisms which make it easy to remain forward and backward compatible when updating the messages. It uses an interface description language to specify the structured data and the Protobuf compiler to generate code for serialization, deserialization and message handling for C++, Java, Python, Objective-C, C#, JavaScript, Ruby, Go, PHP and Dart. The compact wire format and the compatibility related features make Protobuf very interesting for the usage in IoT systems. The wide range of supported languages enables the creation of a heterogeneous processing infrastructure for Protobuf encoded device messages.Using Protobuf in an IoT example
To show the workflow with Protobuf, let‘s define the Protobuf messages for an environment data sensor device in the current proto3 format. The device measures temperature, pressure, humidity and CO2 level and generates events if certain threshold values are exceeded or if a sensor fails.
An important key to understanding why the messages have been designed like this is that there is no way to recognize the message type from the serialized data. We don’t want to send an identifier in addition to the serialized data, so there must be a single point of entry for all message types we are going to send.
syntax = "proto3";
package iotexample;
message EnvironmentData {
double temperature = 1;
double pressure = 2;
double humidity = 3;
double co2_level = 4;
}
enum ErrorLevel {
UNSPECIFIED = 0;
ERROR = 1;
WARNING = 2;
INFO = 3;
}
message Event {
int32 event_number = 1;
ErrorLevel error_level = 2;
string message = 3;
}
message TelemetryMessage {
uint64 timestamp = 1;
oneof payload {
EnvironmentData environment_data = 2;
Event event = 3;
}
}
message DeviceMessages {
repeated TelemetryMessage telemetry_messages = 1;
}
The first line defines that the following description is in the “proto3” syntax which is the most current version of the Protobuf interface description language.
The second line states that the messages defined below belong to the package “iotexample”. This is required to resolve messages imported from other files and will be used to create namespaces in the generated code.
The keyword message
in the following line introduces a new message called EnvironmentData
which has four data fields for storing environment sensor values. Each of the data fields has a type, a name and a field number. The name and the field number must be unique inside a message.
In proto3, all fields of a message are optional. If a field is not present in a serialized message
, the value will be set to a default value by the receiver except for fields with another message as type. Only the field number and the value are serialized, the mapping from field numbers to names is done by the receiver. This means that the length of the field names has no impact on the size of the serialized message.
The enum
keyword in the next block defines an enumeration named ErrorLevel
with four values. A proto3 enum must have 0 as constant for the first value due to the default initialization behavior explained before.
The following message is named Event
and stores data to describe an event that has occurred on our IoT device. It uses the ErrorLevel enum
as type for the second field to encode the severity of the event.
The message TelemetryMessage
is the first step to having a single point of entry for all messages. It combines the two messages we have defined before with a timestamp field. The keyword oneof
makes sure that only one of the fields enclosed in the curly braces has a value at any time. This is enforced by the generated message handling code which clears any previous set field if a member of the oneof is set.
The DeviceMessages
message uses the repeated
keyword to define an array of TelemetryMessage
values.
To sum it all up, we now have the message DeviceMessages
as our point of entry which can be used to create a batch of TelemetryMessage
values which each contain a timestamp and an Event
or EnvironmentData
value.
Important hint for field numbers
The Protobuf encoding requires a different amount of bytes to encode field numbers depending on the number. The field numbers 1-15 can be encoded using one byte, the following numbers up to 2047 require two bytes. If your message contains fields that are not used too often, you should consider assigning them a field number above 15 to have some one byte field numbers available for any extensions with new fields that will be used frequently.
Working with generated code
The previously defined messages are saved to a file which is then used to generate code with protoc.
C++
C++ is our language of choice for the software on the IoT device due to low resource requirements, low level access to the underlying operating system and easy integration of the Azure IoT SDK. The following paragraph will show how to use the C++ code generated by protoc. The generated code makes use of the Protobuf library, so a C++ application using code generated by protoc must be linked againstlibprotobuf
.
Protoc is invoked as follows and generates the two files environment_iot_messages.pb.cc
and environment_iot_messages.pb.h
. protoc --cpp_out=generated environment_iot_messages.proto
oneofs
. #include <generated/environment_iot_messages.pb.h>
#include <QByteArray>
#include <QDateTime>
#include <QDebug>
int main()
{
auto environmentData = new iotexample::EnvironmentData();
environmentData->set_humidity(75);
environmentData->set_pressure(1080);
environmentData->set_temperature(23);
environmentData->set_co2_level(415);
auto eventData = new iotexample::Event();
eventData->set_event_number(123);
eventData->set_message(std::string("My event message"));
eventData->set_error_level(iotexample::ErrorLevel::ERROR);
iotexample::DeviceMessages message;
auto currentMessage = message.add_telemetry_messages();
currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
currentMessage->set_allocated_environment_data(environmentData);
currentMessage = message.add_telemetry_messages();
currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
currentMessage->set_allocated_event(eventData);
qDebug() << "Serialized size:" << message.ByteSizeLong() << "bytes";
QByteArray serializedMessage(message.ByteSizeLong(), Qt::Initialization::Uninitialized);
auto success = message.SerializeToArray(serializedMessage.data(), message.ByteSizeLong());
if (!success)
return 1;
iotexample::DeviceMessages deserializedMessage;
success = deserializedMessage.ParseFromArray(serializedMessage.constData(), serializedMessage.length());
if (!success)
return 2;
qDebug() << "Decoded message with " << deserializedMessage.telemetry_messages_size() << "values";
for (int i = 0; i < deserializedMessage.telemetry_messages_size(); ++i) {
const auto ¤t = deserializedMessage.telemetry_messages(i);
qDebug() << "Message" << i + 1;
qDebug() << " Timestamp:" << QDateTime::fromMSecsSinceEpoch(current.timestamp()).toString(Qt::ISODate);
if (current.has_event()) {
qDebug() << " Event number:" << current.event().event_number();
qDebug() << " Event error level:" << current.event().error_level();
qDebug() << " Event message:" << QString::fromStdString(current.event().message());
} else if (current.has_environment_data()) {
qDebug() << " Temperature:" << current.environment_data().temperature();
qDebug() << " Pressure:" << current.environment_data().pressure();
qDebug() << " Humidity:" << current.environment_data().humidity();
qDebug() << " CO2:" << current.environment_data().co2_level();
} else {
qDebug() << " Empty TelemetryMessages";
}
}
return 0;
}
has_event()
and has_environment_data()
functions have been added by the protobuf compiler because these two fields have another message as type.
The output of this application shows a serialized data size of 80 bytes, all fields are deserialized as expected. Serialized size: 80 bytes
Decoded message with 2 values
Message 1
Timestamp: "2020-04-28T11:04:21"
Temperature: 23
Pressure: 1080
Humidity: 75
CO2: 415
Message 2
Timestamp: "2020-04-28T11:04:21"
Event number: 123
Event error level: 1
Event message: "My event message"
SerializeToArray()
and the serialized data created by this method call would be passed to the component that sends the data into the cloud. JavaScript / Node.js
Node.js is well supported on different cloud platforms which makes it a good choice for handling the incoming data from our IoT device, for example in an Azure Function App triggered by an IoT Hub. The fileenvironment_iot_messages_pb.js
is generated by invoking protoc. protoc --js_out=import_style=commonjs,binary:generated environment_iot_messages.proto
const messages = require('./generated/environment_iot_messages_pb.js')
let environmentData = new proto.iotexample.EnvironmentData();
environmentData.setHumidity(75);
environmentData.setPressure(1080);
environmentData.setTemperature(23);
environmentData.setCo2Level(415);
let eventData = new proto.iotexample.Event();
eventData.setEventNumber(123);
eventData.setErrorLevel(proto.iotexample.ErrorLevel.ERROR);
eventData.setMessage('My event message');
let message = new proto.iotexample.DeviceMessages();
let telemetry = new proto.iotexample.TelemetryMessage();
telemetry.setTimestamp(Date.now());
telemetry.setEnvironmentData(environmentData);
message.addTelemetryMessages(telemetry);
telemetry = new proto.iotexample.TelemetryMessage();
telemetry.setTimestamp(Date.now());
telemetry.setEvent(eventData);
message.addTelemetryMessages(telemetry);
const serializedMessage = message.serializeBinary();
console.log(`Serialized size: ${serializedMessage.length} bytes`);
const deserializedMessage = new proto.iotexample.DeviceMessages.deserializeBinary(serializedMessage);
console.log(`Decoded message with ${deserializedMessage.getTelemetryMessagesList().length} values`);
for (let i = 0; i < deserializedMessage.getTelemetryMessagesList().length; ++i) {
const current = deserializedMessage.getTelemetryMessagesList()[i];
console.log(`Message ${i + 1}`);
console.log(` Timestamp: ${new Date(current.getTimestamp()).toISOString()}`)
if (current.hasEvent()) {
console.log(` Event number: ${current.getEvent().getEventNumber()}`)
console.log(` Event error level: ${current.getEvent().getErrorLevel()}`)
console.log(` Event message: ${current.getEvent().getMessage()}`)
} else if (current.hasEnvironmentData()) {
console.log(` Temperature: ${current.getEnvironmentData().getTemperature()}`)
console.log(` Pressure: ${current.getEnvironmentData().getPressure()}`)
console.log(` Humidity: ${current.getEnvironmentData().getHumidity()}`)
console.log(` CO2: ${current.getEnvironmentData().getCo2Level()}`)
} else {
console.log('Empty TelemetryMessages')
}
}
The output is identical to the C++ example and there is not much difference in the complexity of the message handling.
deserializeBinary()
on a serialized message that has been received from an IoT Device.
The JavaScript generated code also offers a toObject()
method for all messages which returns the content of the message as a JavaScript object. {
"telemetryMessagesList": [
{
"timestamp": 1588065035911,
"environmentData": {
"temperature": 23,
"pressure": 1080,
"humidity": 75,
"co2Level": 415
}
},
{
"timestamp": 1588065035911,
"event": {
"eventNumber": 123,
"errorLevel": 1,
"message": "My event message"
}
}
]
}
Size comparison with JSON encoding
JSON is a widespread format for storing and transmitting structured data in a human readable form.
Let’s encode our example data in JSON to see the difference in data size
[
{
"timestamp": 1587988060797,
"environment_data": {
"temperature": 23,
"pressure": 1080,
"humidity": 75
}
},
{
"timestamp": 1587988060797,
"event": {
"event_number": 123,
"error_level": 1,
"message": "My event message"
}
}
]
The property names have the highest impact on data size. As the property names must always be included, this is a static additional amount of bytes and choosing shorter names leads to a smaller encoded message size.
The influence of numeric values depends on the number of characters needed to encode that number. This could be better or worse than Protobuf, depending on the value.
If all unnecessary whitespaces are removed, the JSON encoded value is still 211 bytes in size, which is 2.63x the size of the Protobuf encoded data from our example.
Even if the property names are replaced by single letters and only single digit numbers are used, the JSON encoding is still 88 bytes in size compared to the 80 bytes for Protobuf with real values.
[
{
"a": 1,
"b": {
"a": 1,
"b": 1,
"c": 1,
"d": 1
}
},
{
"a": 1,
"c": {
"a": 1,
"b": 1,
"c": "My event message"
}
}
]
Protobuf message updates and how to keep them backward compatible
Sooner or later, it may become necessary to extend one of the messages with additional fields or fields have become obsolete and will no longer be processed by the cloud. This would not be a problem if all devices always had the latest hardware revision with the newest firmware. But in real deployments, this won‘t always be the case and the cloud side must be able to deal with receiving different versions of the Protobuf message.
An important rule when changing Protobuf messages is not to reuse field numbers or names in a message for a different purpose.
As an example, let‘s assume that the next revision of our IoT device hardware replaces the CO2 sensor by an ambient light sensor. If the co2_level
field is just renamed to ambientLight
and a sender still uses the old message, the sender’s assumption is that it is sending the co2_level
value. The receiver which is using the new message assumes the field means ambientLight
and writes a wrong value into the database.
To avoid situations like this, ambientLight
should become a new field with a number that was not previously used.
message EnvironmentData {
double temperature = 1;
double pressure = 2;
double humidity = 3;
double co2_level = 4;
double ambientLight = 5;
}
co2_level
field should be left in the message to keep supporting the old sensor by the cloud side system. The new devices don’t have to set the co2_level
field and the receiver will automatically decode it as 0. Find out if a field has been set
Except for fields with a message as type, Proto3 offers no direct way to find out if a value has been explicitly set or is default initialized. As a workaround, a field can be put in a oneof
with exactly one member. The generated code will then contain a function to check if the value has been set.
message EnvironmentData {
double temperature = 1;
double pressure = 2;
double humidity = 3;
double co2_level = 4;
oneof ambient_light_oneof { double ambient_light = 5; }
}
It is safe to move a single existing field into a new oneof
, so in our example, the co2_level
field could be moved into a new oneof
to allow the receiver to determine if this is an old or a new device
message EnvironmentData {
double temperature = 1;
double pressure = 2;
double humidity = 3;
oneof co2_level_oneof { double co2_level = 4; }
oneof ambient_light_oneof { double ambient_light = 5; }
}
oneof
is set. qDebug() << " Temperature:" << current.environment_data().temperature();
qDebug() << " Pressure:" << current.environment_data().pressure();
qDebug() << " Humidity:" << current.environment_data().humidity();
if (environmentData->co2_level_oneof_case() == iotexample::EnvironmentData::Co2LevelOneofCase::kCo2Level)
qDebug() << " CO2:" << current.environment_data().co2_level();
if (environmentData->ambient_light_oneof_case() == iotexample::EnvironmentData::AmbientLightOneofCase::kAmbientLight)
qDebug() << " Ambient light:" << current.environment_data().co2_level();
hasXYZ()
function for each member of the oneof
. The following example shows both ways of checking oneof
fields. console.log(` Temperature: ${current.getEnvironmentData().getTemperature()}`)
console.log(` Pressure: ${current.getEnvironmentData().getPressure()}`)
console.log(` Humidity: ${current.getEnvironmentData().getHumidity()}`)
if (current.getEnvironmentData().getCo2LevelOneofCase() === proto.iotexample.EnvironmentData.Co2LevelOneofCase.CO2_LEVEL)
console.log(` CO2: ${current.getEnvironmentData().getCo2Level()}`)
if (current.getEnvironmentData().hasAmbientLight());
console.log(` Ambient light: ${current.getEnvironmentData().getAmbientLight()}`)
Removing fields
If theco2_level
field is to be removed, Protobuf supports the reserved
keyword which can be used to prevent names and field numbers from being reused. message EnvironmentData {
reserved 4;
reserved “co2_level”;
double temperature = 1;
double pressure = 2;
double humidity = 3;
oneof ambient_light_oneof { double ambient_light = 5; }
}
If an attempt is made to reuse a field number or name that has been marked as reserved, protoc will refuse to generate code that violates these constraints
environment_iot_messages.proto: Field "newField" uses reserved number 4.
environment_iot_messages.proto:7:12: Field name "co2_level" is reserved.
ambientLight
value, the receiver will see the default value 0 or get the information that the value has not been set if the workaround from the previous section has been employed. Extending the IoT Hub example
In order to demonstrate how to use Protobuf in an IoT application, we will now extend the Qt based example from the previous part of the series.
The full example code is hosted on github.
First we have to add new includes for the generated Protobuf code and the Qt random number generator which will provide values to populate the Protobuf messages.
#include <generated/environment_iot_messages.pb.h>
#include <QRandomGenerator>
Then we replace the slot for the timeout
signal of sendMessageTimer
with a new slot which sends an EnvironmentData
message on every invocation and sometimes an additional Event
message.
The messages are serialized and passed to the sendMessage()
method of the IotHubClient
object.
// Initial values for the simulated data
double humidity = 80;
double pressure = 1000;
double temperature = 23;
double co2Level = 410;
QObject::connect(&sendMessageTimer, &QTimer::timeout, &client, [&]() {
qDebug() << "Message send timer triggered" << QDateTime::currentDateTime().toString(Qt::ISODate);
auto environmentData = new iotexample::EnvironmentData();
environmentData->set_humidity(nextRandomValue(humidity, 50, 100, 1));
environmentData->set_pressure(nextRandomValue(pressure, 900, 1100, 1));
environmentData->set_temperature(nextRandomValue(temperature, 10, 75, 1));
environmentData->set_co2_level(nextRandomValue(co2Level,300, 500, 1));
iotexample::DeviceMessages message;
auto currentMessage = message.add_telemetry_messages();
currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
currentMessage->set_allocated_environment_data(environmentData);
// 1% chance of generating an error
if (QRandomGenerator::global()->generateDouble() < 0.01) {
qDebug() << "Generate error message";
auto event = new iotexample::Event();
event->set_message("Simulated event");
event->set_error_level(static_cast<iotexample::ErrorLevel>(QRandomGenerator::global()->bounded(1, 4)));
event->set_event_number(QRandomGenerator::global()->bounded(1, 10));
currentMessage = message.add_telemetry_messages();
currentMessage->set_timestamp(QDateTime::currentMSecsSinceEpoch());
currentMessage->set_allocated_event(event);
}
qDebug() << "Serialized size:" << message.ByteSizeLong() << "bytes";
QByteArray serializedMessage(message.ByteSizeLong(), Qt::Initialization::Uninitialized);
auto success = message.SerializeToArray(serializedMessage.data(), message.ByteSizeLong());
if (success) {
client.sendMessage(++messageId, serializedMessage);
} else {
qWarning() << "Message serialization failed";
}
});
nextRandomValue()
function called in the new slot is a simple helper function with generates a slowly changing random value. double nextRandomValue(double current, double min, double max, double maxIncrement) {
if (current < min || current > max)
current = (min + max) / 2;
const double factor = QRandomGenerator::global()->generateDouble() < 0.5 ? 1 : -1;
const double increment = maxIncrement * QRandomGenerator::global()->generateDouble() * factor;
if (current + increment > max || current + increment < min)
return current - increment;
return current + increment;
}
.pro
file for generating and building the generated Protobuf code and linking to the Protobuf library HEADERS += \
generated/environment_iot_messages.pb.h
SOURCES += \
generated/environment_iot_messages.pb.cc
LIBS += -lprotobuf
PROTO_COMPILER = protoc
system($$PROTO_COMPILER --version) {
PROTOFILES = $$files(*.proto, true)
for (PROTO_FILE, PROTOFILES) {
command = $$PROTO_COMPILER --cpp_out=generated $$PROTO_FILE
system($$command)
}
} else {
error("protoc is required to build this application")
}
Conclusion
Designing and updating Protobuf messages is not very complicated if a few important rules regarding field numbers are observed. The generated handling code uses the same concepts in different programming languages and is easy to use after reading a short introduction on how the different features of Protobuf messages are mapped to code.
The serialized Protobuf data is quite small compared to JSON data which makes it suitable for applications like IoT devices with a mobile broadband connection where small messages are favored for monetary reasons. If all messages have a common point of entry, the raw serialized data can be sent without adding any additional meta information.
A combination of C++ on the IoT device and JavaScript on the receiving side could be used to get data from the IoT device into the cloud.