As a result, it is not necessary to use precious bits to encode these masked frequencies. In perceptual coders, a filter bank divides the audio into multiple bands. When audio in a particular band falls below the masking threshold, few or no bits are devoted to encoding that signal, resulting in a conservation of bits that can then be used where they are needed.
While various codecs use different techniques in the details, the principle is the same for all, and the implementation follows a common plan. There are four major subsections, which work together to generate the coded bitstream:
- The analysis filter bank divides the audio into spectral components. At minimum, sufficient frequency resolution must be used in order to exceed the width of the ear's critical bands, which have widths of 100 Hz below 500 Hz and roughly 20% of the center frequency at higher frequencies. Finer resolution can help a coder make better decisions.
- The estimation of masked threshold section is where the human ear/brain system is modeled. This determines the masking curve, under which noise must fall.
- The audio is reduced to a lower bitrate in the quantization and coding section. On the one hand, the quantization must be sufficiently course in order not to exceed the target bitrate. On the other hand, the error must be shaped to be under the limits set by the masking curve.
- The quantized values are joined in the bitstream multiplex, along with any side information.
Source: telos-systems
1 comment:
Keep up the good work.
Post a Comment