Monday, November 5, 2007

Perceptual coding: The miracle of acoustic masking

All of the MPEG perceptual codecs rely upon the celebrated acoustic masking principle – an amazing property of the human ear/brain aural perception system. When audio is present at a particular frequency, you cannot hear audio at nearby frequencies that are sufficiently low in volume. The inaudible components are masked owing to properties of the human ear that occur at a very low ‘hardware’ level – researchers say the information is dropped straightaway within the ear and is not passed to the brain. This appears to be a kind of ‘natural rate reduction’ that helps to keep the brain from being overloaded with unnecessary information. There is a similar effect working in the time domain, with signals coming soon after the removal of another being also inaudible.

As a result, it is not necessary to use precious bits to encode these masked frequencies. In perceptual coders, a filter bank divides the audio into multiple bands. When audio in a particular band falls below the masking threshold, few or no bits are devoted to encoding that signal, resulting in a conservation of bits that can then be used where they are needed.
While various codecs use different techniques in the details, the principle is the same for all, and the implementation follows a common plan. There are four major subsections, which work together to generate the coded bitstream:
  • The analysis filter bank divides the audio into spectral components. At minimum, sufficient frequency resolution must be used in order to exceed the width of the ear's critical bands, which have widths of 100 Hz below 500 Hz and roughly 20% of the center frequency at higher frequencies. Finer resolution can help a coder make better decisions.
  • The estimation of masked threshold section is where the human ear/brain system is modeled. This determines the masking curve, under which noise must fall.
  • The audio is reduced to a lower bitrate in the quantization and coding section. On the one hand, the quantization must be sufficiently course in order not to exceed the target bitrate. On the other hand, the error must be shaped to be under the limits set by the masking curve.
  • The quantized values are joined in the bitstream multiplex, along with any side information.

Source: telos-systems

1 comment:

Anonymous said...

Keep up the good work.