FRAMINGHAM (05/01/2000) - Having spent the past four weeks discussing a multimedia operating system (Be Inc.'s BeOS), media has been on Gearhead's mind. Moreover, because our evil twin in the "Backspin" column has been opining about Napster Inc., Gearhead thinks it's time to take a look at the MP3 audio encoding system.
MP3 is an abbreviation for MPEG-1 Layer 3, an open standard for music compression defined in 1991 by the Moving Picture Experts Group, a working group of the International Standards Organization (www.iso.ch/) and the International Electrotechnical Commission (www.iec.ch/).
MPEG-1 is actually a collection of complex standards. According to the MPEG site (www.cselt.it/mpeg/), MPEG-1 is for "coding of moving pictures and associated audio for digital storage media at up to about 1.5M-bps."
MP3 compresses music data by a factor of up to 15-to-1, while attempting to minimize perceived (not actual) quality loss. In other words, despite the radical compression ratio and bits being tossed out, Metallica still sounds like Metallica (Gearhead suspects the loss of most bits would leave a Metallica ditty intact as far as the ear could tell; preserving the nuances of Brahms or Bach is an entirely different matter).
MP3 uses a number of techniques, commonly referred to as perceptual coding, that let us decide what data we can throw out by analyzing the frequency and energy distribution of the music in various ways and relating that to a psycho-acoustical model of human hearing.
The minimal audition threshold is a technique that relies on the idea that the ear responds nonlinearly to sound, according to the law of Fletcher and Munson (http://web.wt.net/~rg21518/ fletcher.html).
In the context of MP3, the law means you can ignore sounds under a range of frequency thresholds, depending on how loud they are, because they will not be heard.
This basically involves determining which sounds are blocked by other sounds as far as the human ear is concerned. Thus, a trumpet playing a soft note will be masked on and off if played along with, say, a French horn section.
Another MP3 perceptual coding technique is known as the reservoir of bytes, the method of allowing different segments of an MP3 file to be encoded at different bit rates. This means encoding accuracy can be reduced according to the psycho-acoustical model, which removes data based on how the ear receives sound and the brain perceives it.
MP3 also uses something called Joint Stereo that reduces stereo data. This can be done because the low frequency components of the music aren't as important to stereo imaging as high frequencies. Therefore, the stereo components of the bass end of the music can be combined into a mono image, providing another way of tossing out data.
Finally, Huffman coding is performed on the modified data. This doesn't compress the data further, but stores the data efficiently.
Doing all this encoding has its complexities, which we will discuss next week, along with encoding and decoding software.
Sound off to firstname.lastname@example.org.