A profiler run showed that the main bottleneck was the naive decoding of
the Huffman codes, replacing it with a nice trick borrowed by Zlib gave
a substantial speedup.
Replacing a `%` with a `and (mask-1)` gave another significant
improvement (yay for low hanging fruits).
A few numbers obtained by decompressing a 22M file:
Before:
```
./decompress 2,39s user 0,00s system 99% cpu 2,400 total
```
After:
```
./decompress 0,79s user 0,00s system 99% cpu 0,798 total
````