MQTT 5 minimal knowledge

After using MQTT for years, we've found that many misunderstanding and misconception are still widespread about the protocol. So here's a document that list what is, and what isn't MQTT capable of.

MQTT is a pub/sub protocol

MQTT was created as a way to synchronize data between devices on a network.

It's mainly a serialization protocol (that means: how to encode and decode some data to bytes) and a distributed hierarchy (that means that the encoded data is stored at some position in a hierarchy, like a virtual file system).

MQTT is highly asymmetric.

The client part only care for some virtual files in the hierarchy and can update or receive the update to them. The broker part is a client that also manages subscription to some level of the hierarchy. When some virtual file is modified by a client, the broker role is to notify and send the other client that registered for this branch of the hierarchy. That part is were the was majority of the complexity comes from.

MQTT is small.

MQTT works on TCP with (optionally) TLS security. It isn't the smallest serialization protocol but it's a good compromise between a readable and an implementable protocol. It's used on small device because it doesn't require too much ROM binary size or RAM space. It can be implemented without a heap so it can work on very limited micro-controller.

MQTT isn't like ActivityPub or Mastodon.

MQTT doesn't do any kind of message filtering, no metadata are attached to the data (i.e a "Like" counter), no link between client is maintained, and its session handling is dumb. You can't use MQTT for any kind of social network directly.

MQTT can't fetch a virtual file until it's modified (except with retain flag, see below).

MQTT robustness myth

In MQTT, there is a Quality Of Service flag (later on: QoS) attached to any virtual file that changes how the data is handled between clients and brokers.

We often hear people advising to use QoS 1 or 2 because it's more resilient.

This is wrong.

The QoS handling in MQTT is a very large source of confusion for many reasons:

1. MQTT runs over TCP

TCP is reliable against few packets drop.

If you used QoS by fear of a packet drop, stop doing so, the TCP layer will got you covered. MQTT never, ever retransmit a packet, even with the highest QoS on a valid connection.

2. TCP connection drop

In case the device's connection is sporadic, the TCP layer will fail to synchronize. This means that the TCP connection will be dropped. In turn, the MQTT session will be dropped.

However, due to how TCP is working, the other part might only be able to detect about the connection drop until it's being exercised (or after the TCP lingering time which can be quite long on your system)

MQTT provides a linger time where a previous session can be reconnected (over a new TCP connection). We'll later call this session reconnect as SOS mode. However, most of the time, the linger time (called Keep Alive in MQTT) will expire before the client reconnect or the client will have experienced a reboot/restart meanwhile so it won't be able to resume the session.

In that case, the client's session is a fresh one when it reconnect, and any packet or virtual file changed during the disconnected period is lost to the client, whatever the QoS level.

3. SOS mode

If you are lucky and the client happens to reconnect to the broker while its session is still lingering there, that reconnection is special. Any packet with a non zero QoS will be exchanged immediately upon connection. This means that, let's say the client connection was dropped at 03:00 and resumed at 05:00, you might receive a packet that was sent at 03:01 or at 04:59 (you can't say, there's no timestamp attached to the packet in the protocol). Or, if ten versions of the virtual file were sent in between those ends, you'll receive the last (or first) one only, or you might receive them all in burst (depending on the version of MQTT and/or some of the CONNECT properties in MQTTv5)

This mode is the only mode where a packet is resend in MQTT and it's overly complex to get right.

As you can see, the value added here is small for this Quality of service.

In fact, in MQTT another mechanism was created to solve the issues above: the retain flag. If the packet contains a retain flag, then the client will receive the last version of the virtual file when it reconnects to the broker and re-subscribe to the file, just like if it'd never disconnected at first. Retain is completely independent of QoS so it works with any low level QoS.

4. No data until it's modified

In (plain) MQTT, until a client modifies a virtual file (or create one), the other client won't be able to fetch the previous version of that file. This is really the major bottleneck of the protocol, in our opinions, since if your device reboots or restart for whatever reasons, it won't be able to know what is the state of the monitored system (until all the required virtual file are changed).

For example, imaging a valve that must water your garden at 08:00 to 09:00. You might have a virtual file called valve/on_time containing "08:00" and another one called valve/off_time containing "09:00". If the valve is restarted, it won't work until the interface or another device changes their value. If there are no modification, the change aren't propagated to the subscribed clients.

Hopefully, the retain flag in MQTT partially solves this. When the valve restart, upon subscribing to /valve/on_time, it'll receive the last value of that file. However, the retain flag only store the last value and not any metadata about it (like when it was modified, by who, ...).

5. QoS latency

Unlike the popular opinion, using QoS increase latency for your client. A packet with a higher QoS will be delivered later on (at worst) or will delay any other packet (at best). That's because any QoS packet imply acknowledging them. Acknowledgment implies sending one or three more packet per PUBLISH packet depending on the QoS level.

Some client are typically synchronous (that's a lot easier to write and to get right), so a QoS of 1 will divide the packet bandwidth by 2 (thus, at worst, increase latency twofold), and a QoS of 2 will do the same with a factor 4. For a question of re-entrancy (think of: can I publish while I'm receiving a packet), many client will simply perform the acknowledgement first before calling the receive callback, so it'll increase the visible packet latency). The asynchronous clients (those how store the acknowledgement to perform in a buffer to process later on) can probably deliver the first QoS packet with no additional latency, but the overhead will be later smoothed on the next packets transmission.

Also, a QoS 2 packet is never sent before a QoS 1 packet (unlike QoS for VoIP for example), so using a higher level QoS as a priority for your packet is wrong.

6. Metadata handling

In MQTT v3.1.1, a packet (a virtual file) can not contain metadata. So, if you need to know, for example, when this file was modified, you need to store the modification time in the file itself. This is very inconvenient to modify the data to fit a protocol missing feature.

With MQTT v5, a packet can have properties, and most notably, user properties. So, now you can store this information in the packet, without modifying the payload, i.e, the virtual file.

However, on embedded system with a sporadic connection, in order to save a timestamp in your communication, you need to have a clock and to synchronize it. This is either done with a RTC (but not all embedded system have a RTC and a battery to maintain the clock), or another connection is required to synchronize the clock with the network.

MQTT does not provide any clock synchronization scheme. Few brokers implement a $SYS/broker/time virtual file that contains the current time, but most don't.

This means that, you'll need either a specific client that update such a file once a second (this solution allows the minimum dependency on your embedded system) but require specific code to unsubscribe on the client once received the first content.

Or you'll need to implement a NTP/SNTP client too in your embedded system (more dependencies, and also need a NTP/SNTP server on your network infrastructure).

7. MQTT vs REST

Another method to synchronize data on a system is to use a HTTP client/server and push/pull data via REST protocol.

Both MQTT and REST works over TCP, and the HTTP/REST version is easier to debug.

However, it's HTTP is mainly a client/server protocol and, unless you master the whole network, you can't expect your embedded system will be able to connect over the internet easily. Yet, REST either implies that you run a HTTP server on the embedded system (likely required to bootstrap the system too). This also means that the embedded system is running 24/7 since you can't use the MQTT broker to store data while the system is asleep.

You can logically expect your customer to have a web browser (a HTTP client so everything required to connect to the system), but you can't expect them to have a MQTT client.

Previous Post