Next: Coding of Predicted Frames:Coding
Up: MPEG Video
Previous: B-Frames
The temporal prediction technique used in MPEG video is based on motion estimation. The
basic premise of motion estimation is that in most cases, consecutive video frames will
be similar except for changes induced by objects moving within the frames. In the
trivial case of zero motion between frames (and no other differences caused by noise,
etc.), it is easy for the encoder to efficiently predict the current frame as a
duplicate of the prediction frame. When this is done, the only information necessary to
transmit to the decoder becomes the syntactic overhead necessary to reconstruct the
picture from the original reference frame. When there is motion in the images, the
situation is not as simple.
Figure 7.17 shows an example of a frame with 2 stick figures and a tree. The
second half of this figure is an example of a possible next frame, where panning has
resulted in the tree moving down and to the right, and the figures have moved farther to
the right because of their own movement outside of the panning. The problem for motion
estimation to solve is how to adequately represent the changes, or differences, between
these two video frames.
Motion Estimation Example
The way that motion estimation goes about solving this problem is that a comprehensive
2-dimensional spatial search is performed for each luminance macroblock. Motion
estimation is not applied directly to chrominance in MPEG video, as it is assumed that
the color motion can be adequately represented with the same motion information as the
luminance. It should be noted at this point that MPEG does not define how this search
should be performed. This is a detail that the system designer can choose to implement
in one of many possible ways. This is similar to the bit-rate control algorithms
discussed previously, in the respect that complexity vs. quality issues need to be
addressed relative to the individual application. It is well known that a full,
exhaustive search over a wide 2-dimensional area yields the best matching results in
most cases, but this performance comes at an extreme computational cost to the encoder.
As motion estimation usually is the most computationally expensive portion of the video
encoder, some lower cost encoders might choose to limit the pixel search range, or use
other techniques such as telescopic searches, usually at some cost to the video quality.
Figure 7.18 shows an example of a particular macroblock from Frame 2 of
Figure 7.17, relative to various macroblocks of Frame 1. As can be seen,
the top frame has a bad match with the macroblock to be coded. The middle frame has a
fair match, as there is some commonality between the 2 macroblocks. The bottom frame has
the best match, with only a slight error between the 2 macroblocks. Because a relatively
good match has been found, the encoder assigns motion vectors to the macroblock, which
indicate how far horizontally and vertically the macroblock must be moved so that a
match is made. As such, each forward and backward predicted macroblock may contain 2
motion vectors, so true bidirectionally predicted macroblocks will utilize 4 motion
vectors.
Motion Estimation Macroblock Example
Figure 7.19 shows how a potential predicted Frame 2 can be generated from
Frame 1 by using motion estimation. In this figure, the predicted frame is subtracted
from the desired frame, leaving a (hopefully) less complicated residual error frame that
can then be encoded much more efficiently than before motion estimation. It can be seen
that the more accurate the motion is estimated and matched, the more likely it will be
that the residual error will approach zero, and the coding efficiency will be highest.
Further coding efficiency is accomplished by taking advantage of the fact that motion
vectors tend to be highly correlated between macroblocks. Because of this, the horizontal
component is compared to the previously valid horizontal motion vector and only the
difference is coded. This same difference is calculated for the vertical component
before coding. These difference codes are then described with a variable length code for
maximum compression efficiency.
Final Motion Estimation Prediction
Of course not every macroblock search will result in an acceptable match. If the encoder
decides that no acceptable match exists (again, the "acceptable" criterion is not MPEG
defined, and is up to the system designer) then it has the option of coding that
particular macroblock as an intra macroblock, even though it may be in a P or B frame.
In this manner, high quality video is maintained at a slight cost to coding efficiency.
Next: Coding of Predicted Frames:Coding
Up: MPEG Video
Previous: B-Frames
Dave Marshall
10/4/2001