//////////////////////////////////////////////////////////////////////////// // **** WAVPACK **** // // Hybrid Lossless Wavefile Compressor // // Copyright (c) 1998 - 2005 Conifer Software. // // All Rights Reserved. // // Distributed under the BSD Software License (see license.txt) // //////////////////////////////////////////////////////////////////////////// WavPack 4.0 File / Block Format ------------------------------- A WavPack 4.0 file consists of a series of WavPack audio blocks. It may also contain tags and other information, but these must be outside the blocks (either before, in-between, or after) and are ignored for the purpose of unpacking audio data. The WavPack blocks are easy to identify by their unique header data, and by looking in the header it is very easy to determine the total size of the block, both in physical bytes and compressed samples. There are no seek tables. The blocks (or frames, if you prefer) are completely independent in that they can be decoded to mono or stereo audio all by themselves. A single function is provided to convert a whole block into its corresponding audio data. Similarly, a function is provided to convert a block of audio samples into a finished WavPack block. These all work in memory; disk I/O is handled outside. It is also possible to decode or encode blocks in smaller increments if it is important to distribute CPU load more evenly over time. The blocks may also be decoded without reading the whole block into memory, although this would only be important for hardware decoding. The blocks may contain any number of samples, either stereo or mono. Obviously, putting more samples in each block is more efficient, but they are reasonably efficient down to even a thousand samples. I have set the max size to 1 MB for the whole block, but this is arbitrary. The blocks may be lossless or lossy (currently the lossy modes are basically CBR, but I am planning a quality based VBR version also). For multichannel audio, the data is divided into some number of stereo and mono streams and multiplexed into separate blocks. Because blocks are independent there can be a mix of sampling rates, but all the streams must be sliced at the same point in time which is a multiple of all the sampling rates. The metadata contains source information (like front, center, rear, etc.). Correction files (.wvc) have an identical structure to the main file (.wv) and there is a one-to-one correspondence between main file blocks that contain audio and their correction file match (blocks that do not contain audio do not exist in the correction file). The only difference in the headers of main blocks and correction blocks is the CRC value, although it is easy to tell the blocks apart by looking at the metadata ids. Here is the 32-byte header at the front of every block: typedef struct { char ckID [4]; // "wvpk" long ckSize; // size of entire frame (minus 8, of course) short version; // 0x403 for now uchar track_no; // track number (0 if not used, like now) uchar index_no; // track sub-index (0 if not used, like now) ulong total_samples; // for entire file (-1 if unknown) ulong block_index; // index of first sample in block (to file begin) ulong block_samples; // # samples in this block ulong flags; // various flags for id and decoding ulong crc; // crc for actual decoded data } WavpackHeader; The "flags" field contains information for decoding the block along with some general information including sample size and format, hybrid/lossless, mono/stereo and sampling rate. This structure is stored "little-endian". Following the 32-byte header to the end of the block are a series of "metadata" sub-blocks. These may from 2 bytes long to the size of the entire block and are extremely easy to parse (even without knowing what they mean). Currently these mostly contain extra information needed to decode the audio, but may also contain user information. The only non-audio information I currently have implemented is a copy of the original wave RIFF header (or trailer if present), and the MD5 checksums, but there is plenty of flexibility here. For example, these metadata blocks could store cuesheets, artist/title information, replaygain values, even pictures or lyrics. The final metadata sub-blocks are the actual audio bitstreams, which have ids for standard audio (wvbits), correction data (wvcbits), and a special extension for large integer and floating-point data (wvxbits). The format of the metadata is: uchar id; // mask meaning // ---- ------- // 0x1f metadata function // 0x20 decoder need not understand metadata // 0x40 actual data byte length is 1 less // 0x80 large block (> 255 words) uchar word_size; // small block: data size in words (padded) or... uchar word_size [3]; // large block: data size in words (padded, little-endian) ushort data [word_size]; // data, padded to an even # of bytes