97 lines
5.1 KiB
Plaintext
97 lines
5.1 KiB
Plaintext
////////////////////////////////////////////////////////////////////////////
|
|
// **** WAVPACK **** //
|
|
// Hybrid Lossless Wavefile Compressor //
|
|
// Copyright (c) 1998 - 2005 Conifer Software. //
|
|
// All Rights Reserved. //
|
|
// Distributed under the BSD Software License (see license.txt) //
|
|
////////////////////////////////////////////////////////////////////////////
|
|
|
|
WavPack 4.0 File / Block Format
|
|
-------------------------------
|
|
|
|
A WavPack 4.0 file consists of a series of WavPack audio blocks. It may also
|
|
contain tags and other information, but these must be outside the blocks
|
|
(either before, in-between, or after) and are ignored for the purpose of
|
|
unpacking audio data. The WavPack blocks are easy to identify by their
|
|
unique header data, and by looking in the header it is very easy to
|
|
determine the total size of the block, both in physical bytes and compressed
|
|
samples. There are no seek tables.
|
|
|
|
The blocks (or frames, if you prefer) are completely independent in that they
|
|
can be decoded to mono or stereo audio all by themselves. A single function
|
|
is provided to convert a whole block into its corresponding audio data.
|
|
Similarly, a function is provided to convert a block of audio samples into
|
|
a finished WavPack block. These all work in memory; disk I/O is handled
|
|
outside. It is also possible to decode or encode blocks in smaller increments
|
|
if it is important to distribute CPU load more evenly over time. The blocks may
|
|
also be decoded without reading the whole block into memory, although this
|
|
would only be important for hardware decoding.
|
|
|
|
The blocks may contain any number of samples, either stereo or mono. Obviously,
|
|
putting more samples in each block is more efficient, but they are reasonably
|
|
efficient down to even a thousand samples. I have set the max size to 1 MB for
|
|
the whole block, but this is arbitrary. The blocks may be lossless or lossy
|
|
(currently the lossy modes are basically CBR, but I am planning a quality
|
|
based VBR version also).
|
|
|
|
For multichannel audio, the data is divided into some number of stereo and mono
|
|
streams and multiplexed into separate blocks. Because blocks are independent
|
|
there can be a mix of sampling rates, but all the streams must be sliced at
|
|
the same point in time which is a multiple of all the sampling rates. The
|
|
metadata contains source information (like front, center, rear, etc.).
|
|
|
|
Correction files (.wvc) have an identical structure to the main file (.wv) and
|
|
there is a one-to-one correspondence between main file blocks that contain
|
|
audio and their correction file match (blocks that do not contain audio do
|
|
not exist in the correction file). The only difference in the headers of
|
|
main blocks and correction blocks is the CRC value, although it is easy to
|
|
tell the blocks apart by looking at the metadata ids.
|
|
|
|
Here is the 32-byte header at the front of every block:
|
|
|
|
typedef struct {
|
|
char ckID [4]; // "wvpk"
|
|
long ckSize; // size of entire frame (minus 8, of course)
|
|
short version; // 0x403 for now
|
|
uchar track_no; // track number (0 if not used, like now)
|
|
uchar index_no; // track sub-index (0 if not used, like now)
|
|
ulong total_samples; // for entire file (-1 if unknown)
|
|
ulong block_index; // index of first sample in block (to file begin)
|
|
ulong block_samples; // # samples in this block
|
|
ulong flags; // various flags for id and decoding
|
|
ulong crc; // crc for actual decoded data
|
|
} WavpackHeader;
|
|
|
|
The "flags" field contains information for decoding the block along with some
|
|
general information including sample size and format, hybrid/lossless,
|
|
mono/stereo and sampling rate. This structure is stored "little-endian".
|
|
|
|
Following the 32-byte header to the end of the block are a series of "metadata"
|
|
sub-blocks. These may from 2 bytes long to the size of the entire block and are
|
|
extremely easy to parse (even without knowing what they mean). Currently these
|
|
mostly contain extra information needed to decode the audio, but may also
|
|
contain user information. The only non-audio information I currently have
|
|
implemented is a copy of the original wave RIFF header (or trailer if present),
|
|
and the MD5 checksums, but there is plenty of flexibility here. For example,
|
|
these metadata blocks could store cuesheets, artist/title information,
|
|
replaygain values, even pictures or lyrics. The final metadata sub-blocks are
|
|
the actual audio bitstreams, which have ids for standard audio (wvbits),
|
|
correction data (wvcbits), and a special extension for large integer and
|
|
floating-point data (wvxbits).
|
|
|
|
The format of the metadata is:
|
|
|
|
uchar id; // mask meaning
|
|
// ---- -------
|
|
// 0x1f metadata function
|
|
// 0x20 decoder need not understand metadata
|
|
// 0x40 actual data byte length is 1 less
|
|
// 0x80 large block (> 255 words)
|
|
|
|
uchar word_size; // small block: data size in words (padded)
|
|
or...
|
|
uchar word_size [3]; // large block: data size in words (padded,
|
|
little-endian)
|
|
|
|
ushort data [word_size]; // data, padded to an even # of bytes
|