Bob Currier, Synthetic Aperture
Audio has long fought for equal billing with video. With the acceptance of stereo sound,.home theaters and surround sound, audio has made great strides in the traditional video world. But the battle is being fought all over again in the multimedia world of QuickTime and Video for Windows. We worry about the smallest detail of video compression methods, data rates, and color palettes, but all too often handle the audio as an afterthought. While there are some limitations to what we can achieve with audio in multimedia applications, proper care can yield far better results than the default case we often settle for.
The familiar audio CD is composed of 16-bit samples at a 44.1 KHz data rate. While this rate is supported by the latest computer sound cards, handling that much audio data can tax even high-end systems and reduce video performance. Remember that when establishing the target data rate for compressed video, the audio data rate must be subtracted from the otal, with whatever is left over available for video. Every bit of extra quality we give to the audio side comes straight out of the video side. Simply throwing more data at the audio to improve its quality is not a good solution. That stereo CD-quality audio we all want needs 176.4 K bytes/second, more than we usually allocate to the combination of audio and video ogether!
The most common data rates used for audio in the multimedia environment are 22.050 KHz and 11.025 KHz, both submultiples of the 44.1 KHz CD rate. Lower rates are also available, but are really only useful for low-quality voice. The sample rate we choose determines the maximum frequency that can be reproduced. Sampling theory tells us that the maximum accurately reproducible frequency can be no more than half the sample rate. This "half the sample rate" frequency is known as the Nyquist limit. Keep in mind, however, that this is the theoretical maximum; in the real world many factors conspire to keep you from actually getting to that limit.
The sample size will be either 8- or 16-bits. The most universal is 8-bit, but most sound cards sold today are 16-bit, slowly pushing the 8-bit cards out of the installed base. The sample size determines both the maximum dynamic range and the signal to noise ratio sample. While 16-bit audio has a respectable 98 dB theoretical SNR, 8-bits yields less than 50 dB SNR.
The quality of your audio digitizing card is also important. Many sound cards add the audio input as an afterthought and have serious distortion in their input stages. Also, placing audio gear into a computer box filled with digital signals invites all sorts of interference and noise problems, particularly if working with microphone level inputs. Choose a digitizing card that has proper shielding and a good audio input section, or you will limit your results before you even get something digitized. If you are digitizing from a microphone, you will be better off using an external preamp to boost the signal to line level before feeding the digitizer card.
So what can be done when we are forced to use mono, 11 KHz, 8-bit samples because of data rate limitations, when we know that we will be limited to a 5.5 KHz frequency range and a tiny dynamic range? Not to worry. With proper care in producing the audio, we can get surprisingly good results. And if we can allow ourselves to move to a 22 KHz sample rate, we can get something darn good.
The first thing to do is to make sure that no frequencies above the Nyquist limit are ever sampled. This means inserting a low-pass filter into the audio before we ever get to the digitizing card. Adjust it so that nothing above the Nyquist limit will be passed through. Remember that audio filters are analog devices, so set the cutoff frequency somewhat below the Nyquist limit to allow for the slope of the filter.
Next we need to work on the dynamic range of the material. Remember that digital audio has no "headroom". Once you hit 0 VU there is no more room to encode audio. If you think tape saturation sounds bad, try listening to digital clipping! To avoid clipping, use an audio compressor/limiter to compress the audio signal, reducing the dynamic range, and to limit it, making sure we never exceed the maximum signal level. Keep in mind that this is analog "compression" and is not related to the digital "compression" that we do on digitized video and audio data.
It is easiest to compress and limit the audio while it is in the analog domain. Compression is possible in the digital domain, but it requires that your digitizer have the ability to capture the full dynamic range so that it can pass the data to the digital compressor. This means having 20-bit capture capability to get the original signal digitized; not something that most multimedia producers will have access to.
So now the audio has been filtered, compressed, and limited, and we are finally ready to do some digitizing! Unlike video, where we want to capture at the maximum resolution and then reduce it later, audio does not always work best that way. One might think that if we were to capture 16-bit samples at 44.1 KHz, and we then wanted 8-bit 11.025 KHz audio as the final product, all we would have to do is take the high-order 8-bits of every fourth sample. You can do this, but it will sound terrible! Resampling is a complex task. Even when going between rates that are even multiples of each other, there is a lot of work that must be done to achieve acceptable results. And when the rates are not even multiples, things really get to be a mess.
Good resampling algorithms do exist, and the best digital audio processing programs can change sample frequencies and size while maintaining good audio quality. If you do not have such a program, however, you may be better off sampling your audio at the final sample rate, eliminating all resampling. For resampling on the Macintosh, Macromedia SoundEdit 16 and Adobe Premiere both have resample capabilities that produce good results. (Some versions of Premiere hide the resample filter in a separate folder, so you may need to go hunting for it.) A truly excellent tool for resampling is the L1 Ultramaximizer from Waves, a plug-in for Sound Designer II. Waves also offers a standalone program, WaveConvert, specifically for resampling sound. This tool incorporates the most commonly used features of the more expensive UltraMaximizer, and is available in Mac and PC versions. You can also use the shareware program SoundHack. On the PC, SoundForge from Sonic Foundry does a good job.
Another problem to overcome is the listening environment. Most multimedia applications will be heard through a 3-inch speaker located next to a power supply fan, in a room filled with humming equipment. Not the ideal way to listen to anyone's sonic masterpiece. Remember to audition your final audio through such a "typical" system. What may sound great in a studio environment can completely fall apart when heard in its ultimate destination.
Unlike video, compression of the digital audio data is not common. This is partly because the payoff for compression is not as great, what with audio having less data to compress, and also because most compression algorithms (A-law, mu-law, etc.) came from the telephony industry and were designed for voice. Most high-fidelity compression algorithms have been proprietary.
QuickTime does support MACE compression at 3:1 and 6:1 compression, but the quality is, again, best suited to voice. QuickTime 2.0 added support for IMA compression which can yield good sounding results at a 4:1 compression ratio. It is still relatively new, so not all tools support it yet. Also new and showing great promise is MPEG/audio compression. An internationalstandard aimed at high-fidelity audio, it is based on a perceptual model of the human auditory system, and can yield significant compression ratios regardless of the audio source. However, like its MPEG/video counterpart, it requires dedicated hardware or significant computational power to implement. At this point, audio data compression is something that is best to experiment with to see if you like the results, as no clear standard has yet emerged.
If you use the methods described above, you can get surprisingly good sounding audio into 11 KHz, 8-bit samples. Splurge on a 22 KHz sample rate and things will really take off. However, all of the preceding assumes that computers are all built according to straightforward, rational rules. Unfortunately, when we want to use the same audio on different computer systems, there are other things we need to be aware of.
The standard audio sample rates on the PC are 22.050 and 11.025 KHz. On most Macintoshes, however, the sample rates are 22.254 and 11.127 KHz. It doesn't sound like much of a difference, but it can have a big effect on both the quality of the audio and the audio/video playback performance. When the sample rate of the audio data and the hardware don't match, software is used to resample to the hardware rate. On the Macintosh, this is done quickly, maintaining good audio quality. On the PC side, it is left up to the sound board driver. Sometimes this is done well, sometimes not so well. Not only can audio quality suffer, but if the resampling is done inefficiently, it can cause serious lip sync problems during playback. For this reason it is best to use the PC standards of 11.025 and 22.050 KHz in cross-platform applications, letting the Mac do the extra work of resampling. The latest Macintoshes support both traditional Mac rates and PC-standard rates, eliminating resampling.
With some extra care devoted to the audio side of things, your multimedia extravaganza can sound as good as it looks.
Bob Currier is President of Synthetic Aperture, a multimedia production company specializing in digital video and QuickTime VR. He also serves as Sysop of the Macintosh Multimedia Forum on CompuServe.
He can be reached at email@example.com. Be sure to visit the Synthetic Aperture web site at <http://www.synthetic-ap.com/> for more tutorial information, sample content, and information on new media services.
This article orignally appeared in a slightly different form in Computer Video Production magazine.
Tips & Articles | About Us | What's New | Press Room