243 lines
13 KiB
Plaintext
243 lines
13 KiB
Plaintext
|
////////////////////////////////////////////////////////////////////////////
|
||
|
// **** WAVPACK **** //
|
||
|
// Hybrid Lossless Wavefile Compressor //
|
||
|
// Copyright (c) 1998 - 2006 Conifer Software. //
|
||
|
// All Rights Reserved. //
|
||
|
// Distributed under the BSD Software License (see license.txt) //
|
||
|
////////////////////////////////////////////////////////////////////////////
|
||
|
|
||
|
WavPack 4.0 File / Block Format
|
||
|
-------------------------------
|
||
|
|
||
|
December 9, 2006
|
||
|
David Bryant
|
||
|
|
||
|
updated: April 29, 2007
|
||
|
updated: Sept 26, 2009
|
||
|
|
||
|
1.0 INTRODUCTION
|
||
|
|
||
|
A WavPack 4.0 file consists of a series of WavPack audio blocks. It may also
|
||
|
contain tags and other information, but these must be outside the blocks
|
||
|
(either before, in-between, or after) and are ignored for the purpose of
|
||
|
unpacking audio data. The WavPack blocks are easy to identify by their unique
|
||
|
header data, and by looking in the header it is very easy to determine the total
|
||
|
size of the block (both in physical bytes and compressed samples) and the audio
|
||
|
format stored. There are no specialized seek tables.
|
||
|
|
||
|
The blocks are completely independent in that they can be decoded to mono or
|
||
|
stereo audio all by themselves. The blocks may contain any number of samples
|
||
|
(well, up to 131072), either stereo or mono. Obviously, putting more samples
|
||
|
in each block is more efficient because of reduced header overhead, but they are
|
||
|
reasonably efficient down to even a thousand samples. I have set the max size to
|
||
|
1 MB for the whole block, but this is arbitrary. The blocks may be lossless or
|
||
|
lossy. Currently the hybrid/lossy modes are basically CBR, but I am planning a
|
||
|
quality based VBR version also, and all the provisions exist for this in the
|
||
|
format.
|
||
|
|
||
|
For multichannel audio, the data is divided into some number of stereo and mono
|
||
|
streams and multiplexed into separate blocks which repeat in sequence. A flag
|
||
|
in the header indicates whether the block is the first or the last in the
|
||
|
sequence (for simple mono or stereo files both of these would always be set).
|
||
|
The speaker assignments are in standard Microsoft order and the channel_mask is
|
||
|
transmitted in a separate piece of metadata. Channels that naturally belong
|
||
|
together (i.e. left and right pairs) are put into stereo blocks and other
|
||
|
channels are put into mono block. So, for example, a standard 5.1 audio stream
|
||
|
would have a channel_mask of 0x3F and be organized into 4 blocks in sequence:
|
||
|
|
||
|
1. stereo block (front left + front right) (INITIAL_BLOCK)
|
||
|
2. mono block (front center)
|
||
|
3. mono block (low frequency effects)
|
||
|
4. stereo block (back left + back right) (FINAL_BLOCK)
|
||
|
|
||
|
Correction files (.wvc) have an identical structure to the main file (.wv) and
|
||
|
there is a one-to-one correspondence between main file blocks that contain audio
|
||
|
and their correction file match (blocks that do not contain audio do not exist
|
||
|
in the correction file). The only difference in the headers of main blocks and
|
||
|
correction blocks is the size and the CRC value, although it is easy (if a
|
||
|
little ugly) to tell the blocks apart by looking at the metadata ids.
|
||
|
|
||
|
The format is designed with hardware decoding in mind, and so it is possible to
|
||
|
decode regular stereo (or mono) WavPack files without buffering an entire block,
|
||
|
which allows the memory requirements to be reduced to only a few kilobytes if
|
||
|
desired. This is not true of multichannel files, and this also restricts
|
||
|
playback of high-resolution files to 24 bits of precision (although neither of
|
||
|
these would be associated with low-cost playback equipment).
|
||
|
|
||
|
2.0 BLOCK HEADER
|
||
|
|
||
|
Here is the 32-byte little-endian header at the front of every WavPack block:
|
||
|
|
||
|
typedef struct {
|
||
|
char ckID [4]; // "wvpk"
|
||
|
uint32_t ckSize; // size of entire block (minus 8, of course)
|
||
|
uint16_t version; // 0x402 to 0x410 are currently valid for decode
|
||
|
uchar track_no; // track number (0 if not used, like now)
|
||
|
uchar index_no; // track sub-index (0 if not used, like now)
|
||
|
uint32_t total_samples; // total samples for entire file, but this is
|
||
|
// only valid if block_index == 0 and a value of
|
||
|
// -1 indicates unknown length
|
||
|
uint32_t block_index; // index of first sample in block relative to
|
||
|
// beginning of file (normally this would start
|
||
|
// at 0 for the first block)
|
||
|
uint32_t block_samples; // number of samples in this block (0 = no audio)
|
||
|
uint32_t flags; // various flags for id and decoding
|
||
|
uint32_t crc; // crc for actual decoded data
|
||
|
} WavpackHeader;
|
||
|
|
||
|
Note that in this context the meaning of "samples" refers to a complete
|
||
|
sample for all channels (sometimes called a "frame"). Therefore, in a stereo
|
||
|
or multichannel file the actual number of numeric samples is this value
|
||
|
multiplied by the number of channels. This effectively limits the size of an
|
||
|
on-disk WavPack file to (2^32)-2 samples, although this should not be a big
|
||
|
restriction for most applications (that is over 24 hours at 44.1 kHz, no
|
||
|
matter how many channels).
|
||
|
|
||
|
There is no limit to the size of the WavPack file itself, although the
|
||
|
library currently cannot seek in WavPack files over 4 gig. Also, the .wav
|
||
|
format itself has a 4 gig limit, so this limits the size of the source and
|
||
|
destination files (although this is planned to be resolved with the W64
|
||
|
and RIFF64 file formats).
|
||
|
|
||
|
Normally, the first block of a WavPack file that contains audio samples
|
||
|
(blocks may contains only metadata) would have "block_index" == 0 and
|
||
|
"total_samples" would be equal to the total number of samples in the
|
||
|
file. However, there are some possible exceptions to this rule. For example,
|
||
|
a file may be created such that its total length is unknown (i.e. with
|
||
|
pipes) and in this case total_samples == -1. For these files, the WavPack
|
||
|
decoder will attempt to seek to the end of the file to determine the actual
|
||
|
length, and if this is impossible then the length is simply unknown.
|
||
|
|
||
|
Another case is where a WavPack file is created by cutting a portion out of a
|
||
|
longer WavPack file (or from a WavPack stream). Since this file would start
|
||
|
with a block that didn't have "block_index" == 0, the length would be unknown
|
||
|
until a seek to end was performed. In fact, an on-disk file would still be
|
||
|
perfectly playable and seekable as long as there were less than (2^32)-2 total
|
||
|
samples (the "block_index" could even wrap).
|
||
|
|
||
|
It is also possible to have streamed WavPack data. In this case both the
|
||
|
"block_index" and "total_samples" fields are ignored for every block and the
|
||
|
decoder simply decodes every block encountered indefinitely.
|
||
|
|
||
|
The "flags" field contains information for decoding the block along with some
|
||
|
general information including sample size and format, hybrid/lossless,
|
||
|
mono/stereo and sampling rate (if one of 15 standard rates). Here are the
|
||
|
(little-endian) bit assignments:
|
||
|
|
||
|
bits 1,0: // 00 = 1 byte / sample (1-8 bits / sample)
|
||
|
// 01 = 2 bytes / sample (1-16 bits / sample)
|
||
|
// 10 = 3 bytes / sample (1-24 bits / sample)
|
||
|
// 11 = 4 bytes / sample (1-32 bits / sample)
|
||
|
bit 2: // 0 = stereo output; 1 = mono output
|
||
|
bit 3: // 0 = lossless mode; 1 = hybrid mode
|
||
|
bit 4: // 0 = true stereo; 1 = joint stereo (mid/side)
|
||
|
bit 5: // 0 = independent channels; 1 = cross-channel decorrelation
|
||
|
bit 6: // 0 = flat noise spectrum in hybrid; 1 = hybrid noise shaping
|
||
|
bit 7: // 0 = integer data; 1 = floating point data
|
||
|
bit 8: // 1 = extended size integers (> 24-bit) or shifted integers
|
||
|
bit 9: // 0 = hybrid mode parameters control noise level
|
||
|
// 1 = hybrid mode parameters control bitrate
|
||
|
bit 10: // 1 = hybrid noise balanced between channels
|
||
|
bit 11: // 1 = initial block in sequence (for multichannel)
|
||
|
bit 12: // 1 = final block in sequence (for multichannel)
|
||
|
bits 17-13: // amount of data left-shift after decode (0-31 places)
|
||
|
bits 22-18: // maximum magnitude of decoded data
|
||
|
// (number of bits integers require minus 1)
|
||
|
bits 26-23: // sampling rate (1111 = unknown/custom)
|
||
|
bits 27-28: // reserved (but decoders should ignore if set)
|
||
|
bit 29: // 1 = use IIR for negative hybrid noise shaping
|
||
|
bit 30: // 1 = false stereo (data is mono but output is stereo)
|
||
|
bit 31: // reserved (decoders should refuse to decode if set)
|
||
|
|
||
|
|
||
|
3.0 METADATA SUB-BLOCKS
|
||
|
|
||
|
Following the 32-byte header to the end of the block are a series of "metadata"
|
||
|
sub-blocks. These may from 2 bytes long to the size of the entire block and are
|
||
|
extremely easy to parse (even without knowing what they mean). These mostly
|
||
|
contain extra information needed to decode the audio, but may also contain user
|
||
|
information that is not required for decoding and that could be used in the
|
||
|
future without breaking existing decoders. The final sub-block is usually the
|
||
|
compressed audio bitstream itself, although this is not a strict rule.
|
||
|
|
||
|
The format of the metadata is:
|
||
|
|
||
|
uchar id; // mask meaning
|
||
|
// ---- -------
|
||
|
// 0x1f metadata function
|
||
|
// 0x20 decoder need not understand metadata
|
||
|
// 0x40 actual data byte length is 1 less
|
||
|
// 0x80 large block (> 255 words)
|
||
|
|
||
|
uchar word_size; // small block: data size in words (padded)
|
||
|
or...
|
||
|
uchar word_size [3]; // large block: data size in words (padded,
|
||
|
little-endian)
|
||
|
|
||
|
uint16_t data [word_size]; // data, padded to an even # of bytes
|
||
|
|
||
|
The currently assigned metadata ids are:
|
||
|
|
||
|
ID_DUMMY 0x0 // could be used to pad WavPack blocks
|
||
|
ID_DECORR_TERMS 0x2 // decorrelation terms & deltas (fixed)
|
||
|
ID_DECORR_WEIGHTS 0x3 // initial decorrelation weights
|
||
|
ID_DECORR_SAMPLES 0x4 // decorrelation sample history
|
||
|
ID_ENTROPY_VARS 0x5 // initial entropy variables
|
||
|
ID_HYBRID_PROFILE 0x6 // entropy variables specific to hybrid mode
|
||
|
ID_SHAPING_WEIGHTS 0x7 // info needed for hybrid lossless (wvc) mode
|
||
|
ID_FLOAT_INFO 0x8 // specific info for floating point decode
|
||
|
ID_INT32_INFO 0x9 // specific info for decoding integers > 24
|
||
|
// bits, or data requiring shift after decode
|
||
|
ID_WV_BITSTREAM 0xa // normal compressed audio bitstream (wv file)
|
||
|
ID_WVC_BITSTREAM 0xb // correction file bitstream (wvc file)
|
||
|
ID_WVX_BITSTREAM 0xc // special extended bitstream for floating
|
||
|
// point data or integers > 24 bit (can be
|
||
|
// in either wv or wvc file, depending...)
|
||
|
ID_CHANNEL_INFO 0xd // contains channel count and channel_mask
|
||
|
|
||
|
ID_RIFF_HEADER 0x21 // RIFF header for .wav files (before audio)
|
||
|
ID_RIFF_TRAILER 0x22 // RIFF trailer for .wav files (after audio)
|
||
|
ID_CONFIG_BLOCK 0x25 // some encoding details for info purposes
|
||
|
ID_MD5_CHECKSUM 0x26 // 16-byte MD5 sum of raw audio data
|
||
|
ID_SAMPLE_RATE 0x27 // non-standard sampling rate info
|
||
|
|
||
|
Note: unlisted ids are reserved.
|
||
|
|
||
|
The RIFF header and trailer are optional for most playback purposes, however
|
||
|
older decoders (< 4.40) will not decode to .wav files unless at least the
|
||
|
ID_RIFF_HEADER is present. In the future these could be used to encode other
|
||
|
uncompressed audio formats (like AIFF).
|
||
|
|
||
|
4.0 METADATA TAGS
|
||
|
|
||
|
These tags are not to be confused with the metadata sub-blocks described above
|
||
|
but are specialized tags for storing user data on many formats of audio files.
|
||
|
The tags recommended for use with WavPack files (and the ones that the WavPack
|
||
|
supplied plugins and programs will work with) are ID3v1 and APEv2. The ID3v1
|
||
|
tags are somewhat primitive and limited, but are supported for legacy purposes.
|
||
|
The more recommended tagging format is APEv2 because of its rich functionality
|
||
|
and broad software support (it is also used on Monkey's Audio and Musepack
|
||
|
files). Both the APEv2 tags and/or ID3v1 tags must come at the end of the
|
||
|
WavPack file, with the ID3v1 coming last if both are present.
|
||
|
|
||
|
For the APEv2 tags, the following field names are officially supported and
|
||
|
recommended by WavPack (although there are no restrictions on what field names
|
||
|
may be used):
|
||
|
|
||
|
Artist
|
||
|
Title
|
||
|
Album
|
||
|
Track
|
||
|
Year
|
||
|
Genre
|
||
|
Comment
|
||
|
Cuesheet (note: may include replay gain info as remarks)
|
||
|
Replaygain_Track_Gain
|
||
|
Replaygain_Track_Peak
|
||
|
Replaygain_Album_Gain
|
||
|
Replaygain_Album_Peak
|
||
|
Cover Art (Front)
|
||
|
Cover Art (Back)
|
||
|
Log
|
||
|
|