resourceone.info Art Iso Iec 11172 3 Pdf

ISO IEC 11172 3 PDF

Monday, August 5, 2019


Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s -- Part 3: Audio. ISOIIEC. I1 First edition. Information technology - Coding of moving pictures and associated audio for digital storage media at up to about. Requirements for Extension of ISO/IEC to Multichannel . The ISO/ IEC audio coding standard can be used together with.


Iso Iec 11172 3 Pdf

Author:ABDUL BEDDARD
Language:English, Spanish, Indonesian
Country:Singapore
Genre:Religion
Pages:322
Published (Last):18.03.2016
ISBN:743-9-28867-170-8
ePub File Size:22.76 MB
PDF File Size:11.15 MB
Distribution:Free* [*Regsitration Required]
Downloads:35850
Uploaded by: CAMILLE

3-Annex G (informative) Joint Stereo Coding. FOREWORD. This standard is a committee draft that was submitted for approval to ISO-IEC/JTC1 SC29 on ISO-IEC(Mp3) MPEG-3 (MP3) protocol specification. The normative Plat: PDF Size: KB Downloads: Upload time: (), the compression of audio (), and the synchronization of the and the International Electrotechnical Commission (ISO/IEC) adopted this.

This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames thus improving compression of the video content.

For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards.

MPEG-1 operates on video in a series of 8x8 blocks for quantization. However, because chroma color is subsampled by a factor of 4, each pair of red and blue chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16x16, is called a macroblock. A macroblock is the smallest independent unit of color video.

Motion vectors see below operate solely at the macroblock level. If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture though the extra decoded pixels are not displayed.

To decrease the amount of temporal redundancy in a video, only blocks that change are updated, up to the maximum GOP size. This is known as conditional replenishment. However, this is not very effective by itself. Through motion estimation the encoder can compensate for this movement and remove a large amount of redundant information. The encoder compares the current frame with adjacent parts of the video from the anchor frame previous I- or P- frame in a diamond pattern, up to a encoder-specific predefined radius limit from the area of the current macroblock.

If a match is found, only the direction and distance i. The reverse of this process, performed by the decoder to reconstruct the picture, is called motion compensation. A predicted macroblock rarely matches the current picture perfectly, however. The larger the error, the more data must be additionally encoded in the frame.

For efficient video compression, it is very important that the encoder is capable of effectively and precisely performing motion estimation. Motion vectors record the distance between two areas on screen based on the number of pixels called pels.

The finer the precision of the MVs, the more accurate the match is likely to be, and the more efficient the compression. There are trade-offs to higher precision, however. Finer MVs result in larger data size, as larger numbers must be stored in the frame for every single MV, increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder, and diminishing returns minimal gains with higher precision MVs.

Half-pel was chosen as the ideal trade-off. Because neighboring macroblocks are likely to have very similar motion vectors, this redundant information can be compressed quite effectively by being stored DPCM -encoded. Only the smaller amount of difference between the MVs for each macroblock needs to be stored in the final bitstream. P-frames have one motion vector per macroblock, relative to the previous anchor frame. B-frames, however, can use two motion vectors; one from the previous anchor frame, and one from the future anchor frame.

An even more serious problem exists with macroblocks that contain significant, random, edge noise , where the picture transitions to typically black. All the above problems also apply to edge noise.

In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality or increase the bitrate of the video substantially. Each 8x8 block is encoded by first applying a forward discrete cosine transform FDCT and then a quantization process. In reality, there are some sometimes large rounding errors introduced both by quantization in the encoder as described in the next section and by IDCT approximation error in the decoder.

Prior to , it was specified by IEEE The FDCT process converts the 8x8 block of uncompressed pixel values brightness or color difference values into an 8x8 indexed array of frequency coefficient values.

One of these is the statistically high in variance DC coefficient , which represents the average value of the entire 8x8 block. The other 63 coefficients are the statistically smaller AC coefficients , which are positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient.

ISO/IEC 11172-3:1993/COR1:1996

Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using DPCM encoding. Only the smaller amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream. Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization see below.

Quantization of digital data is, essentially, the process of reducing the accuracy of a signal, by dividing it into some larger step size i. The frame-level quantizer is either dynamically selected by the encoder to maintain a certain user-specified bitrate, or much less commonly directly specified by the user. Contrary to popular belief, a fixed frame-level quantizer set by the user does not deliver a constant level of quality.

Instead, it is an arbitrary metric that will provide a somewhat varying level of quality, depending on the contents of each frame. Given two files of identical sizes, the one encoded at an average bitrate should look better than the one encoded with a fixed quantizer variable bitrate. Constant quantizer encoding can be used, however, to accurately determine the minimum and maximum bitrates possible for encoding a given video.

A quantization matrix is a string of numbers which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image. Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix.

Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized drastically reduced.

MPEG-1 actually uses two separate quantization matrices, one for intra-blocks I-blocks and one for inter-block P- and B- blocks so quantization of different block types can be done independently, and so, more effectively. This quantization process usually reduces a significant number of the AC coefficients to zero, known as sparse data which can then be more efficiently compressed by entropy coding lossless compression in the next step.

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 video compression artifacts , like blockiness , color banding , noise , ringing , discoloration , et al. This happens when video is encoded with an insufficient bitrate, and the encoder is therefore forced to use high frame-level quantizers strong quantization through much of the video. Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same original values.

Since these lossless data compression steps don't add noise into, or otherwise change the contents unlike quantization , it is sometimes referred to as noiseless coding. The coefficients of quantized DCT blocks tend to zero towards the bottom-right.

Maximum compression can be achieved by a zig-zag scanning of the DCT block starting from the top left and using Run-length encoding techniques.

Run-length encoding RLE is a very simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times.

For example, if someone were to say "five nines", you would know they mean the number: RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero called sparse data , and can be represented with just a couple of bytes.

This is stored in a special 2- dimensional Huffman table that codes the run-length and the run-ending character. Huffman Coding is a very popular method of entropy coding, and used in MPEG-1 video to reduce the data size.

The data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code. This keeps the data as small as possible with this form of compression. The decoder simply reverses this process to produce the original data.

This is the final step in the video encoding process, so the result of Huffman coding is known as the MPEG-1 video "bitstream. I-frames store complete frame info within the frame and are therefore suited for random access. P-frames provide compression using motion vectors relative to the previous frame I or P.

B-frames provide maximum compression but require the previous as well as next frame for computation. Therefore, processing of B-frames requires more buffer on the decoded side. Such configurations are therefore not suited for video-telephony or video-conferencing applications.

The typical data rate of an I-frame is 1 bit per pixel while that of a P-frame is 0. MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't hear , either because they are in frequencies where the ear has limited sensitivity, or are masked by other typically louder sounds.

MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous. It has lower complexity than Layer II to facilitate real-time encoding on the hardware available circa It uses a low-delay 32 sub-band polyphased filter bank for time-frequency mapping; having overlapping ranges i. The size of a Layer II frame is fixed at samples coefficients.

This offers low delay as only a small number of samples are analyzed before encoding, as opposed to frequency domain encoding like MP3 which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and transient impulses such as percussive instruments, and applause , offering avoidance of artifacts like pre-echo.

The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable.

Of the samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the masking threshold , and how much quantization noise each can contain without being perceived.

Any sounds below the absolute threshold of hearing ATH are completely discarded. The available bits are then assigned to each sub-band accordingly. Typically, sub-bands are less important if they contain quieter sounds smaller coefficient than a neighboring i.

Also, "noise" components typically have a more significant masking effect than "tonal" components.

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range amplitude of the coefficient , i. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range. Layer II can also optionally use intensity stereo coding, a form of joint stereo. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound.

This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality transparent audio.

MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients impulses like percussive sounds: Mahieux and J.

MP3 is a frequency-domain audio transform encoder. MP3 works on samples like Layer II, but needs to take multiple frames for analysis before frequency-domain MDCT processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate VBR encoding while maintaining sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place.

MP3 does not benefit from the 32 sub-band polyphased filter bank, instead just using an point MDCT transformation on each output to split the data into frequency components, and processing it in the frequency domain.

This causes quantization artifacts, due to transient sounds like percussive events and other high-frequency events that spread over a larger window. This results in audible smearing and pre-echo.

MP3 has an aliasing cancellation stage specifically to mask this problem, but which instead produces frequency domain energy which must be encoded in the audio. This is pushed to the top of the frequency range, where most people have limited hearing, in hopes the distortion it causes will be less audible.

Layer II's point FFT doesn't entirely cover all samples, and would omit several entire MP3 sub-bands, where quantization factors must be determined.

MP3 instead uses two passes of FFT analysis for spectral estimation, to calculate the global and individual masking thresholds. This allows it to cover all samples. Of the two, it utilizes the global masking threshold level from the more critical pass, with the most difficult audio. Unlike intensity stereo, this process does not discard any audio information. When combined with quantization, however, it can exaggerate artifacts.

These technical limitations inherently prevent MP3 from providing critically transparent quality at any bitrate. This makes Layer II sound quality actually superior to MP3 audio, when it is used at a high enough bitrate to avoid noticeable artifacts.

The term "transparent" often gets misused, however. MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction. They were introduced to maintain higher quality sound when encoding audio at lower-bitrates.

Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder. C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing.

Other suffixes such as.

iec 11172 3 pdf to 1

An MP3 file is typically an uncontained stream of raw audio; the conventional way to tag MP3 files is by writing data to "garbage" segments of each frame, which preserve the media information but are discarded by the player. This is similar in many respects to how raw. AAC files are tagged but this is less supported nowadays, e. Note that although it would apply,. From Wikipedia, the free encyclopedia. Further information: MPEG program stream.

Main article: Archived from the original on Retrieved CS1 maint: Archived copy as title link Q. This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.

The most important, yet simplest, is I-frame. I-frame is an abbreviation for Intra-frame , so-called because they can be decoded independently of any other frames.

They may also be known as I-pictures , or keyframes due to their somewhat similar function to the key frames used in animation. I-frames can be considered effectively identical to baseline JPEG images. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment at least not without computationally intensive re-encoding.

I-frame only compression is very fast, but produces very large file sizes: So much so that very high-speed and theoretically lossless in reality, there are rounding errors conversion can be made from one format to the other, provided a couple of restrictions color space and quantization matrix are followed in the creation of the bitstream.

The length between I-frames is known as the group of pictures GOP size.

With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit. Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders See: IEEE P-frame is an abbreviation for Predicted-frame. They may also be called forward-predicted frames , or inter-frames B-frames are also inter-frames.

P-frames exist to improve compression by exploiting the temporal over time redundancy in a video. P-frames store only the difference in image from the frame either an I-frame or P-frame immediately preceding it this reference frame is also called the anchor frame.

The difference between a P-frame and its anchor frame is calculated using motion vectors on each macroblock of the frame see below. Such motion vector data will be embedded in the P-frame for use by the decoder. A P-frame can contain any number of intra-coded blocks, in addition to any forward-predicted blocks. If a video drastically changes from one frame to the next such as a cut , it is more efficient to encode it as an I-frame. B-frame stands for bidirectional-frame. They may also be known as backwards-predicted frames or B-pictures.

B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames i. It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed.

This means decoding B-frames requires larger data buffers and causes an increased delay on both decoding and during encoding. As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders.

Related Post: NBR IEC 60529 PDF

No other frames are predicted from a B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future P-frames would be predicted from it and would lower the quality of the entire sequence.

However, similarly, the future P-frame must still encode all the changes between it and the previous I- or P- anchor frame.

B-frames can also be beneficial in videos where the background behind an object is being revealed over several frames, or in fading transitions, such as scene changes.

A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks. MPEG-1 has a unique frame type not found in later video standards. D-frames or DC-pictures are independent images intra-frames that have been encoded using DC transform coefficients only AC coefficients are removed when encoding D-frames—see DCT below and hence are very low quality.

D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed.

Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames thus improving compression of the video content.

For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards. MPEG-1 operates on video in a series of 8x8 blocks for quantization. However, because chroma color is subsampled by a factor of 4, each pair of red and blue chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16x16, is called a macroblock.

A macroblock is the smallest independent unit of color video. Motion vectors see below operate solely at the macroblock level. If the height or width of the video are not exact multiples of 16, full rows and full columns of macroblocks must still be encoded and decoded to fill out the picture though the extra decoded pixels are not displayed. To decrease the amount of temporal redundancy in a video, only blocks that change are updated, up to the maximum GOP size.

This is known as conditional replenishment. However, this is not very effective by itself. Through motion estimation the encoder can compensate for this movement and remove a large amount of redundant information. The encoder compares the current frame with adjacent parts of the video from the anchor frame previous I- or P- frame in a diamond pattern, up to a encoder-specific predefined radius limit from the area of the current macroblock.

If a match is found, only the direction and distance i.

info_isoiec11172-3%7Bed1.0%7Den

The reverse of this process, performed by the decoder to reconstruct the picture, is called motion compensation. A predicted macroblock rarely matches the current picture perfectly, however. The larger the error, the more data must be additionally encoded in the frame. For efficient video compression, it is very important that the encoder is capable of effectively and precisely performing motion estimation. Motion vectors record the distance between two areas on screen based on the number of pixels called pels.

The finer the precision of the MVs, the more accurate the match is likely to be, and the more efficient the compression. There are trade-offs to higher precision, however. Finer MVs result in larger data size, as larger numbers must be stored in the frame for every single MV, increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder, and diminishing returns minimal gains with higher precision MVs.

Half-pel was chosen as the ideal trade-off. Because neighboring macroblocks are likely to have very similar motion vectors, this redundant information can be compressed quite effectively by being stored DPCM -encoded. Only the smaller amount of difference between the MVs for each macroblock needs to be stored in the final bitstream.

P-frames have one motion vector per macroblock, relative to the previous anchor frame. B-frames, however, can use two motion vectors; one from the previous anchor frame, and one from the future anchor frame. An even more serious problem exists with macroblocks that contain significant, random, edge noise , where the picture transitions to typically black.

All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality or increase the bitrate of the video substantially.

Each 8x8 block is encoded by first applying a forward discrete cosine transform FDCT and then a quantization process. In reality, there are some sometimes large rounding errors introduced both by quantization in the encoder as described in the next section and by IDCT approximation error in the decoder. Prior to , it was specified by IEEE The FDCT process converts the 8x8 block of uncompressed pixel values brightness or color difference values into an 8x8 indexed array of frequency coefficient values.

One of these is the statistically high in variance DC coefficient , which represents the average value of the entire 8x8 block. The other 63 coefficients are the statistically smaller AC coefficients , which are positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient. Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using DPCM encoding.

Only the smaller amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream.

Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization see below. Quantization of digital data is, essentially, the process of reducing the accuracy of a signal, by dividing it into some larger step size i.

The frame-level quantizer is either dynamically selected by the encoder to maintain a certain user-specified bitrate, or much less commonly directly specified by the user.

Contrary to popular belief, a fixed frame-level quantizer set by the user does not deliver a constant level of quality. Instead, it is an arbitrary metric that will provide a somewhat varying level of quality, depending on the contents of each frame. Given two files of identical sizes, the one encoded at an average bitrate should look better than the one encoded with a fixed quantizer variable bitrate.

Constant quantizer encoding can be used, however, to accurately determine the minimum and maximum bitrates possible for encoding a given video.

A quantization matrix is a string of numbers which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image. Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix.

Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized drastically reduced. MPEG-1 actually uses two separate quantization matrices, one for intra-blocks I-blocks and one for inter-block P- and B- blocks so quantization of different block types can be done independently, and so, more effectively.

This quantization process usually reduces a significant number of the AC coefficients to zero, known as sparse data which can then be more efficiently compressed by entropy coding lossless compression in the next step.

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 video compression artifacts , like blockiness , color banding , noise , ringing , discoloration , et al. This happens when video is encoded with an insufficient bitrate, and the encoder is therefore forced to use high frame-level quantizers strong quantization through much of the video.

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same original values. Since these lossless data compression steps don't add noise into, or otherwise change the contents unlike quantization , it is sometimes referred to as noiseless coding. The coefficients of quantized DCT blocks tend to zero towards the bottom-right. Maximum compression can be achieved by a zig-zag scanning of the DCT block starting from the top left and using Run-length encoding techniques.

Run-length encoding RLE is a very simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times. For example, if someone were to say "five nines", you would know they mean the number: RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero called sparse data , and can be represented with just a couple of bytes.

This is stored in a special 2- dimensional Huffman table that codes the run-length and the run-ending character. Huffman Coding is a very popular method of entropy coding, and used in MPEG-1 video to reduce the data size. The data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code.

This keeps the data as small as possible with this form of compression.

The decoder simply reverses this process to produce the original data. This is the final step in the video encoding process, so the result of Huffman coding is known as the MPEG-1 video "bitstream. I-frames store complete frame info within the frame and are therefore suited for random access. P-frames provide compression using motion vectors relative to the previous frame I or P. B-frames provide maximum compression but require the previous as well as next frame for computation. Therefore, processing of B-frames requires more buffer on the decoded side.

Such configurations are therefore not suited for video-telephony or video-conferencing applications.

The typical data rate of an I-frame is 1 bit per pixel while that of a P-frame is 0. MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't hear , either because they are in frequencies where the ear has limited sensitivity, or are masked by other typically louder sounds. MPEG-1 Audio is divided into 3 layers.

Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous.

It has lower complexity than Layer II to facilitate real-time encoding on the hardware available circa It uses a low-delay 32 sub-band polyphased filter bank for time-frequency mapping; having overlapping ranges i. The size of a Layer II frame is fixed at samples coefficients.

This offers low delay as only a small number of samples are analyzed before encoding, as opposed to frequency domain encoding like MP3 which must analyze many times more samples before it can decide how to transform and output encoded audio.

This also offers higher performance on complex, random and transient impulses such as percussive instruments, and applause , offering avoidance of artifacts like pre-echo.

The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable. Of the samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the masking threshold , and how much quantization noise each can contain without being perceived.

Any sounds below the absolute threshold of hearing ATH are completely discarded. The available bits are then assigned to each sub-band accordingly.

Typically, sub-bands are less important if they contain quieter sounds smaller coefficient than a neighboring i. Also, "noise" components typically have a more significant masking effect than "tonal" components.

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range amplitude of the coefficient , i. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range. Layer II can also optionally use intensity stereo coding, a form of joint stereo.

On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality transparent audio.Given two files of identical sizes, the one encoded at an average bitrate should look better than the one encoded with a fixed quantizer variable bitrate.

MPEG-1 operates on video in a series of 8x8 blocks for quantization. Other suffixes such as. These low resolutions, combined with a bitrate less than 1. Systems Program stream Part 2: