[M4IF Technotes] Re: GMC concept
Kris Huber
khuber sorenson.com
Wed Jun 12 12:24:49 EDT 2002
Hello Kasturi,
>Please help..
>
>>
>> Hi,
>>
>> >From what I understand, for a zoomed reference, the motion vector
>> is of the form X=Ai, Y=Ej. The warped reference for each macroblock
>> is exactly the same shape and next to each other (no overlap). But,
>> concept-wise only the zoom-center is scaled exactly the same as the
>> picture and rest of them are distored (perspective distortion). So how
>> exactly the global compensation helps in this scenario while using the
>> same parameters A and E for all the macroblocks.
>>
>> Thanks,
>>
>> Kasturi
The description above is not in the language of the visual standard. The
description is similar to global motion models 0 and 1, but neither of these
allow for any zooming at all; the affine models
(no_of_sprite_warping_points=2 or 3) is required for zooming. Model 2 can
handle translation, rotation, and zooming, but the geometry of the reference
frame is preserved. Model 3 allows, in addition, the reference frame to be
"skewed", i.e., geometry of reference frame modified in such a way that
rectangles turn into parallelograms. Model 4, the perspective transform, is
not part of the GMC tool.
I think the author of the comment above is correct about the zoom-center
being scaled exactly as we would want and a poorer match possible further
out from the center if any of the following are true:
- The optical system generating the sequence has significant distortion at
the focal length and iris setting in use. Usually for distant scenes there
is less distortion than close-ups. If the camera iris is almost closed, so
that only the center of the lens is effective, then I think distortion tends
to be less as well--so bright scenes may have less distortion and low-light
scenes more distortion (I could be wrong on this point though--I think it's
true based on the fact that a pin-hole lens is ideal and distortion-free).
- The optical system generating the sequence has significant distortion and
there is a large translational motion of the scene.
- The optical system generating the sequence has significant distortion and
the shape of the distortion changes as the scene is zoomed (I am not
familiar with the multiple lens systems normally used for zooming and do not
know how common this is, or if it occurs at all).
That said, distortion is a figure of merit that is minimized during the
design of most cameras. Often it is minimized to the point that it is
imperceptible. I suspect that effectiveness of GMC is more sensitive to
optical distortion than the human eye, but I don't think this matters much
except maybe at very low frame rates, when we can expect, from the video
compression algorithm's perspective, very large global translations and very
fast zooms. My guess is that for relatively high frame rates (maybe 15 fps
and above) the effectiveness of GMC prediction should not be affected
noticeably by optical distortion.
Optical distortion is something we do live with, and can be expected to be
present in quite a few cameras, I think. How about applying some GMC
hardware to compensate for it ;-)? You'd have to generate the
distortion-correcting coordinates using appropriate distortion-correcting
equations rather than those defined in the MPEG-4 visual stardard. Such
equations might be similar to the perspective transform's, but probably
would differ from design to design and according to camera settings. For
360-degree cameras, I'm sure digital distortion correction is already done.
Best regards,
Kris
P.S. Note that "distortion" in this discussion is optical distortion, which
is not the same as the distortion used to characterize image compression
performance. There are two types of optical distortion - fish-eye or barrel
distortion (in which a square viewed through it tends toward a circle) and
pin-cushion distortion (which is the opposite, with the sides of the square
appearing curved toward the center and the corners appearing more pointed).
More information about the Mp4-tech
mailing list