Skip navigation.
KDE Developer's Journals

KCabinet - mostly working

brad hards's picture

I finally got the KCabinet class (in playground/libs/kcabinet) working.

To make any sense out of this, you need to know that the Microsoft Cabinet format is based on blocks of data (the CFDATA block) that end up being <= 32768 bytes. There are four ways that the data is packaged - uncompressed, MS-ZIP format, Quantum and LZX. I'm mainly interested in MS-ZIP, which is basically the deflate algorithm with a little header (similar to how gzip works). With MS-ZIP, you get around 2K of compressed data in each block, which expands to 32768 bytes in each block - except the last one of course.

So what I was doing was using zlib to inflate() each block. The problem is that when a single file extends over more than one block, zlib wouldn't actually decode anything beyond the first block - it just returns Z_STREAM_END, which makes sense (since each block is complete), but didn't ever write beyond the first of uncompressed output (i.e. 32768 bytes of good output, followed by whatever I initialised the QByteArray to.

I tried lots of things. I switched to using zlib directly (instead of through KFilterDev), I tried various options to the zlib functions. I pulled what is left of my hair out. I considered importing the libmspack library, or a modified copy (e.g. from Samba or ClamAV).

In the end, I asked the zlib authors. Mark Adler (possibly not an up-to-date site) came back overnight with the right answer (despite never having tried it): the previous 32K block as a dictionary. You can use inflateSetDictionary() to set the dictionary
for the next block before decompressing.

Magic!

Essentially the sequence that works for me is:

inflateInit2( streamState, -MAX_WBITS );
for each block:
        parse the header (including the CK prefix on the data);
        read the compressed data, and add that the streamState;
        inflate( streamState, Z_SYNC_FLUSH );
        copy the decompressed data to the output buffer;
        inflateReset( streamState );
        inflateSetDictionary( streamState, decompressedData, decompressedDataSize );
inflateEnd( streamState );

I never would have got that without the assist.

We still might need mspack (if we ever need LZX or Quantum), but zlib is already a dependency for KDE, and it works for me Smiling

Next step is to follow Aaron's advice, and refactor the code to make sure it is easy to understand and robust. I am beyond the experimentation stage now, and I've got the unit tests to make sure it doesn't go bad.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
carlo's picture

> Some code bases aren't big

> Some code bases aren't big on dependencies

And all these projects are a pita for maintainers, when it comes to security issues in the included code, as it means a lot more work to address N++ packages, not to forget the chance to miss one.

If you ever choose to use libmspack, please depend on the shared library and commit improved code to it as you need to. It needs someone to start, but it's not impossible, to get at least a subset of projects back on track, using a common code base. I'm sure distributors will follow in their very own interest.

vdboor's picture

Cool!

Cool stuff! I'd love to use your class when we've ported KMess to KDE 4. It removes another dependency for us Smiling

rebel's picture

libmspack

I'm not very convinced about usefulness of having yet another copy of the same code, what's exactly the reason to not use libmspack (distros should be fixed I assume)? Do you know the reasons for forks in Samba or ClamAV?

brad hards's picture

Why not libmspack?

Note that I didn't use libmspack code (I did take the unit test files), and I'm not sure why some projects chose to copy it. Samba and ClamAV aren't the only examples...

Some suggestions:

Some code bases aren't big on dependencies (OO.o is probably the most extreme example). Possibly that might have been part of the reason.

Samba and ClamAV forms are pretty heavily modified to support their internal APIs. It is clearly the same code but via quite different paths. I also found reference to the KDE reaktivate project having this kind of code, but couldn't extract the code from the dead kdenonbeta module to look at it.

Not many distros package libmspack (I wanted it for fedora - not available), which is a bit of a shame since cabextract includes it anyway, and everyone has that. However it does impact on useability of my code if there is a hard dependency that a lot of people don't have.

libmspack doesn't appear to be actively developed - I may be wrong in my understanding, but the last release was Sep 2006 and the most recent commits I can see are 15 months ago.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.