Tinytinycompressor T2C

1st Jul 2021 software Compression

TinyTinyCompressor (T2C)

It's a very tiny compression and decompression library for embedded system.

It's very fast and does not consume too much resources.

It's using less than 100 lines of code, compiles to less than 820 bytes on ARM (less then 300 bytes for the decompressor only).

It's not as efficient as Heatshrink in terms of compression ratio, but it is much faster (8x faster on compression and 4.5x faster on decompression).

Source code

The GPLv3 licensed code is available here

Description of the algorithm

The algorithm is based on OpenZFS's LZJB compressor. It's a Lempel Ziv with Jeff Bonwick mods on it.

Like all other LZ algorithm, the compression works by finding identical substring in the source stream and replacing them with references (offset from the current position to the source text, and the matching substring's length). The compressor scan the source stream and compute a hash of the next 3 bytes. This hash is used to index a hash table storing the last position of the hashed value. Upon starting, no position are saved in the hash table. Then the first position is saved and the search continue.

Upon a identical substring prefix, the hash will match and the offset of the source stream will be retrieved from the hash table.

From this offset, the matching substring length will be computed.

Then the compressor will emit a match reference instead of the substring (the reference being much smaller than the substring here).

For the decompressor to be able to reconstruct the source stream, it must know when a sequence of code is a match and when it's plain source stream. This is done with a 8 bit bitmap that's storing if the next 8 bytes are source or reference (a 1 bit is used to mark a reference). So for every 8 bytes of compressed data, there's one additional byte for the type bitmap.

This makes the decompression very fast and simple (it's mainly doing a single bit test per output byte). The compression is not complex either and can be embedded in the software too.

The disavantage of this system is the relatively less efficient compression ratio compared to other compression algorithms.

Benchmark

The source file used for this benchmark is a G-Code file of 309 564 bytes

Algorithm	Compressor binary size	Decompressor binary size	Internal memory	Compressed data size
Heatshrink	1964 bytes (ARM)	1336 bytes (ARM)	< 40 bytes (decomp)	135 398 bytes
Heatshrink	2176 bytes (AVR)	1268 bytes (AVR)	< 40 bytes (decomp)
T2C	528 bytes (ARM)	292 bytes (ARM)	512/4 bytes (comp/decomp)	196 217 bytes
T2C	618 bytes (AVR)	300 bytes (AVR)	512/4 bytes (comp/decomp)
T2C(128)	528 bytes (ARM)	292 bytes (ARM)	256/4 bytes (comp/decomp)	199 021 bytes
T2C(128)	618 bytes (AVR)	300 bytes (AVR)	256/4 bytes (comp/decomp)

T2C can be build with a smaller hashtable (down to 128 buckets / 256 bytes) but it implies a smaller compression factor. Whatever the hash table size, the same decompression code will be able to decompress the stream.

Heatshrink is not designed for embedded compression. The compression is expected to be done by a powerful CPU and only the decompression is expected to run on the embedded CPU.

Previous Post Next Post