cog/Frameworks/WavPack/Files/file_format.txt

243 lines
13 KiB
Plaintext

////////////////////////////////////////////////////////////////////////////
// **** WAVPACK **** //
// Hybrid Lossless Wavefile Compressor //
// Copyright (c) 1998 - 2006 Conifer Software. //
// All Rights Reserved. //
// Distributed under the BSD Software License (see license.txt) //
////////////////////////////////////////////////////////////////////////////
WavPack 4.0 File / Block Format
-------------------------------
December 9, 2006
David Bryant
updated: April 29, 2007
updated: Sept 26, 2009
1.0 INTRODUCTION
A WavPack 4.0 file consists of a series of WavPack audio blocks. It may also
contain tags and other information, but these must be outside the blocks
(either before, in-between, or after) and are ignored for the purpose of
unpacking audio data. The WavPack blocks are easy to identify by their unique
header data, and by looking in the header it is very easy to determine the total
size of the block (both in physical bytes and compressed samples) and the audio
format stored. There are no specialized seek tables.
The blocks are completely independent in that they can be decoded to mono or
stereo audio all by themselves. The blocks may contain any number of samples
(well, up to 131072), either stereo or mono. Obviously, putting more samples
in each block is more efficient because of reduced header overhead, but they are
reasonably efficient down to even a thousand samples. I have set the max size to
1 MB for the whole block, but this is arbitrary. The blocks may be lossless or
lossy. Currently the hybrid/lossy modes are basically CBR, but I am planning a
quality based VBR version also, and all the provisions exist for this in the
format.
For multichannel audio, the data is divided into some number of stereo and mono
streams and multiplexed into separate blocks which repeat in sequence. A flag
in the header indicates whether the block is the first or the last in the
sequence (for simple mono or stereo files both of these would always be set).
The speaker assignments are in standard Microsoft order and the channel_mask is
transmitted in a separate piece of metadata. Channels that naturally belong
together (i.e. left and right pairs) are put into stereo blocks and other
channels are put into mono block. So, for example, a standard 5.1 audio stream
would have a channel_mask of 0x3F and be organized into 4 blocks in sequence:
1. stereo block (front left + front right) (INITIAL_BLOCK)
2. mono block (front center)
3. mono block (low frequency effects)
4. stereo block (back left + back right) (FINAL_BLOCK)
Correction files (.wvc) have an identical structure to the main file (.wv) and
there is a one-to-one correspondence between main file blocks that contain audio
and their correction file match (blocks that do not contain audio do not exist
in the correction file). The only difference in the headers of main blocks and
correction blocks is the size and the CRC value, although it is easy (if a
little ugly) to tell the blocks apart by looking at the metadata ids.
The format is designed with hardware decoding in mind, and so it is possible to
decode regular stereo (or mono) WavPack files without buffering an entire block,
which allows the memory requirements to be reduced to only a few kilobytes if
desired. This is not true of multichannel files, and this also restricts
playback of high-resolution files to 24 bits of precision (although neither of
these would be associated with low-cost playback equipment).
2.0 BLOCK HEADER
Here is the 32-byte little-endian header at the front of every WavPack block:
typedef struct {
char ckID [4]; // "wvpk"
uint32_t ckSize; // size of entire block (minus 8, of course)
uint16_t version; // 0x402 to 0x410 are currently valid for decode
uchar track_no; // track number (0 if not used, like now)
uchar index_no; // track sub-index (0 if not used, like now)
uint32_t total_samples; // total samples for entire file, but this is
// only valid if block_index == 0 and a value of
// -1 indicates unknown length
uint32_t block_index; // index of first sample in block relative to
// beginning of file (normally this would start
// at 0 for the first block)
uint32_t block_samples; // number of samples in this block (0 = no audio)
uint32_t flags; // various flags for id and decoding
uint32_t crc; // crc for actual decoded data
} WavpackHeader;
Note that in this context the meaning of "samples" refers to a complete
sample for all channels (sometimes called a "frame"). Therefore, in a stereo
or multichannel file the actual number of numeric samples is this value
multiplied by the number of channels. This effectively limits the size of an
on-disk WavPack file to (2^32)-2 samples, although this should not be a big
restriction for most applications (that is over 24 hours at 44.1 kHz, no
matter how many channels).
There is no limit to the size of the WavPack file itself, although the
library currently cannot seek in WavPack files over 4 gig. Also, the .wav
format itself has a 4 gig limit, so this limits the size of the source and
destination files (although this is planned to be resolved with the W64
and RIFF64 file formats).
Normally, the first block of a WavPack file that contains audio samples
(blocks may contains only metadata) would have "block_index" == 0 and
"total_samples" would be equal to the total number of samples in the
file. However, there are some possible exceptions to this rule. For example,
a file may be created such that its total length is unknown (i.e. with
pipes) and in this case total_samples == -1. For these files, the WavPack
decoder will attempt to seek to the end of the file to determine the actual
length, and if this is impossible then the length is simply unknown.
Another case is where a WavPack file is created by cutting a portion out of a
longer WavPack file (or from a WavPack stream). Since this file would start
with a block that didn't have "block_index" == 0, the length would be unknown
until a seek to end was performed. In fact, an on-disk file would still be
perfectly playable and seekable as long as there were less than (2^32)-2 total
samples (the "block_index" could even wrap).
It is also possible to have streamed WavPack data. In this case both the
"block_index" and "total_samples" fields are ignored for every block and the
decoder simply decodes every block encountered indefinitely.
The "flags" field contains information for decoding the block along with some
general information including sample size and format, hybrid/lossless,
mono/stereo and sampling rate (if one of 15 standard rates). Here are the
(little-endian) bit assignments:
bits 1,0: // 00 = 1 byte / sample (1-8 bits / sample)
// 01 = 2 bytes / sample (1-16 bits / sample)
// 10 = 3 bytes / sample (1-24 bits / sample)
// 11 = 4 bytes / sample (1-32 bits / sample)
bit 2: // 0 = stereo output; 1 = mono output
bit 3: // 0 = lossless mode; 1 = hybrid mode
bit 4: // 0 = true stereo; 1 = joint stereo (mid/side)
bit 5: // 0 = independent channels; 1 = cross-channel decorrelation
bit 6: // 0 = flat noise spectrum in hybrid; 1 = hybrid noise shaping
bit 7: // 0 = integer data; 1 = floating point data
bit 8: // 1 = extended size integers (> 24-bit) or shifted integers
bit 9: // 0 = hybrid mode parameters control noise level
// 1 = hybrid mode parameters control bitrate
bit 10: // 1 = hybrid noise balanced between channels
bit 11: // 1 = initial block in sequence (for multichannel)
bit 12: // 1 = final block in sequence (for multichannel)
bits 17-13: // amount of data left-shift after decode (0-31 places)
bits 22-18: // maximum magnitude of decoded data
// (number of bits integers require minus 1)
bits 26-23: // sampling rate (1111 = unknown/custom)
bits 27-28: // reserved (but decoders should ignore if set)
bit 29: // 1 = use IIR for negative hybrid noise shaping
bit 30: // 1 = false stereo (data is mono but output is stereo)
bit 31: // reserved (decoders should refuse to decode if set)
3.0 METADATA SUB-BLOCKS
Following the 32-byte header to the end of the block are a series of "metadata"
sub-blocks. These may from 2 bytes long to the size of the entire block and are
extremely easy to parse (even without knowing what they mean). These mostly
contain extra information needed to decode the audio, but may also contain user
information that is not required for decoding and that could be used in the
future without breaking existing decoders. The final sub-block is usually the
compressed audio bitstream itself, although this is not a strict rule.
The format of the metadata is:
uchar id; // mask meaning
// ---- -------
// 0x1f metadata function
// 0x20 decoder need not understand metadata
// 0x40 actual data byte length is 1 less
// 0x80 large block (> 255 words)
uchar word_size; // small block: data size in words (padded)
or...
uchar word_size [3]; // large block: data size in words (padded,
little-endian)
uint16_t data [word_size]; // data, padded to an even # of bytes
The currently assigned metadata ids are:
ID_DUMMY 0x0 // could be used to pad WavPack blocks
ID_DECORR_TERMS 0x2 // decorrelation terms & deltas (fixed)
ID_DECORR_WEIGHTS 0x3 // initial decorrelation weights
ID_DECORR_SAMPLES 0x4 // decorrelation sample history
ID_ENTROPY_VARS 0x5 // initial entropy variables
ID_HYBRID_PROFILE 0x6 // entropy variables specific to hybrid mode
ID_SHAPING_WEIGHTS 0x7 // info needed for hybrid lossless (wvc) mode
ID_FLOAT_INFO 0x8 // specific info for floating point decode
ID_INT32_INFO 0x9 // specific info for decoding integers > 24
// bits, or data requiring shift after decode
ID_WV_BITSTREAM 0xa // normal compressed audio bitstream (wv file)
ID_WVC_BITSTREAM 0xb // correction file bitstream (wvc file)
ID_WVX_BITSTREAM 0xc // special extended bitstream for floating
// point data or integers > 24 bit (can be
// in either wv or wvc file, depending...)
ID_CHANNEL_INFO 0xd // contains channel count and channel_mask
ID_RIFF_HEADER 0x21 // RIFF header for .wav files (before audio)
ID_RIFF_TRAILER 0x22 // RIFF trailer for .wav files (after audio)
ID_CONFIG_BLOCK 0x25 // some encoding details for info purposes
ID_MD5_CHECKSUM 0x26 // 16-byte MD5 sum of raw audio data
ID_SAMPLE_RATE 0x27 // non-standard sampling rate info
Note: unlisted ids are reserved.
The RIFF header and trailer are optional for most playback purposes, however
older decoders (< 4.40) will not decode to .wav files unless at least the
ID_RIFF_HEADER is present. In the future these could be used to encode other
uncompressed audio formats (like AIFF).
4.0 METADATA TAGS
These tags are not to be confused with the metadata sub-blocks described above
but are specialized tags for storing user data on many formats of audio files.
The tags recommended for use with WavPack files (and the ones that the WavPack
supplied plugins and programs will work with) are ID3v1 and APEv2. The ID3v1
tags are somewhat primitive and limited, but are supported for legacy purposes.
The more recommended tagging format is APEv2 because of its rich functionality
and broad software support (it is also used on Monkey's Audio and Musepack
files). Both the APEv2 tags and/or ID3v1 tags must come at the end of the
WavPack file, with the ID3v1 coming last if both are present.
For the APEv2 tags, the following field names are officially supported and
recommended by WavPack (although there are no restrictions on what field names
may be used):
Artist
Title
Album
Track
Year
Genre
Comment
Cuesheet (note: may include replay gain info as remarks)
Replaygain_Track_Gain
Replaygain_Track_Peak
Replaygain_Album_Gain
Replaygain_Album_Peak
Cover Art (Front)
Cover Art (Back)
Log