cog/Libraries/WavPack/Files/format.txt

97 lines
5.1 KiB
Plaintext

////////////////////////////////////////////////////////////////////////////
// **** WAVPACK **** //
// Hybrid Lossless Wavefile Compressor //
// Copyright (c) 1998 - 2005 Conifer Software. //
// All Rights Reserved. //
// Distributed under the BSD Software License (see license.txt) //
////////////////////////////////////////////////////////////////////////////
WavPack 4.0 File / Block Format
-------------------------------
A WavPack 4.0 file consists of a series of WavPack audio blocks. It may also
contain tags and other information, but these must be outside the blocks
(either before, in-between, or after) and are ignored for the purpose of
unpacking audio data. The WavPack blocks are easy to identify by their
unique header data, and by looking in the header it is very easy to
determine the total size of the block, both in physical bytes and compressed
samples. There are no seek tables.
The blocks (or frames, if you prefer) are completely independent in that they
can be decoded to mono or stereo audio all by themselves. A single function
is provided to convert a whole block into its corresponding audio data.
Similarly, a function is provided to convert a block of audio samples into
a finished WavPack block. These all work in memory; disk I/O is handled
outside. It is also possible to decode or encode blocks in smaller increments
if it is important to distribute CPU load more evenly over time. The blocks may
also be decoded without reading the whole block into memory, although this
would only be important for hardware decoding.
The blocks may contain any number of samples, either stereo or mono. Obviously,
putting more samples in each block is more efficient, but they are reasonably
efficient down to even a thousand samples. I have set the max size to 1 MB for
the whole block, but this is arbitrary. The blocks may be lossless or lossy
(currently the lossy modes are basically CBR, but I am planning a quality
based VBR version also).
For multichannel audio, the data is divided into some number of stereo and mono
streams and multiplexed into separate blocks. Because blocks are independent
there can be a mix of sampling rates, but all the streams must be sliced at
the same point in time which is a multiple of all the sampling rates. The
metadata contains source information (like front, center, rear, etc.).
Correction files (.wvc) have an identical structure to the main file (.wv) and
there is a one-to-one correspondence between main file blocks that contain
audio and their correction file match (blocks that do not contain audio do
not exist in the correction file). The only difference in the headers of
main blocks and correction blocks is the CRC value, although it is easy to
tell the blocks apart by looking at the metadata ids.
Here is the 32-byte header at the front of every block:
typedef struct {
char ckID [4]; // "wvpk"
long ckSize; // size of entire frame (minus 8, of course)
short version; // 0x403 for now
uchar track_no; // track number (0 if not used, like now)
uchar index_no; // track sub-index (0 if not used, like now)
ulong total_samples; // for entire file (-1 if unknown)
ulong block_index; // index of first sample in block (to file begin)
ulong block_samples; // # samples in this block
ulong flags; // various flags for id and decoding
ulong crc; // crc for actual decoded data
} WavpackHeader;
The "flags" field contains information for decoding the block along with some
general information including sample size and format, hybrid/lossless,
mono/stereo and sampling rate. This structure is stored "little-endian".
Following the 32-byte header to the end of the block are a series of "metadata"
sub-blocks. These may from 2 bytes long to the size of the entire block and are
extremely easy to parse (even without knowing what they mean). Currently these
mostly contain extra information needed to decode the audio, but may also
contain user information. The only non-audio information I currently have
implemented is a copy of the original wave RIFF header (or trailer if present),
and the MD5 checksums, but there is plenty of flexibility here. For example,
these metadata blocks could store cuesheets, artist/title information,
replaygain values, even pictures or lyrics. The final metadata sub-blocks are
the actual audio bitstreams, which have ids for standard audio (wvbits),
correction data (wvcbits), and a special extension for large integer and
floating-point data (wvxbits).
The format of the metadata is:
uchar id; // mask meaning
// ---- -------
// 0x1f metadata function
// 0x20 decoder need not understand metadata
// 0x40 actual data byte length is 1 less
// 0x80 large block (> 255 words)
uchar word_size; // small block: data size in words (padded)
or...
uchar word_size [3]; // large block: data size in words (padded,
little-endian)
ushort data [word_size]; // data, padded to an even # of bytes