[Mp4-tech] [H.264] [Systems] Picture timing in absence of SEI messages

Ben Avison ben.avison tematic.com
Mon Aug 15 19:54:53 ESTEDT 2005


Thanks, John. I suspected that it was possible to construct such a decoder
(after all, in the long term, the number of frames decoded and output has to
even out, if we discount the no_output_of_prior_pics flag).
However, this wasn't really the problem I was outlining. It's best
considered from the viewpoint of the author of a multiplexer program:
suppose you are presented with a H.264 bitstream with no picture timing SEI
messages, and it's your job to ensure that the CPB neither overflows nor
underflows. You need a model of an H.264 decoder to target - do you assume
a decoder that emits one frame for each frame decoded and expect that any
decoders that don't will compensate for the differences, or vice versa?
If you assume a decoder that can emit multiple frames between decoding two
consecutive ones and decode multiple frames between emitting consecutive
ones, then there is a second problem of defining when pictures are removed
from the CPB at those points in the bitstream where multiple frames are
decoded in the time it takes to emit a single one. Perhaps a sensible
algorithm would be for the last picture to be removed from the CPB at an
time before the end of the output period of the current frame which is
derived from the size of the coded picture in bits and MaxBR for the
profile level, and the previous picture at an equivalent time before that,
and so on. But this needs to be explicitly standardised to ensure
interoperability between arbitrary multiplexers and decoders. It should even
go to the lengths of describing how rounding, if any, should be applied to
the above calculation.
The beauty of assuming that decode and display occur in lock step is that
it's really simple to calculate CPB removal times (aka the DTS), because
they are tied to the field_pic_flag and pic_struct of the frame that is
being output at the same time.
(By the way, I'm not sure where the problem with unpaired reference fields
lies: surely it's just a case that at worst a complete frame buffer
consisting of either a frame picture or a complementary field pair needs to
be decoded in the time it takes to display an unpaired field. This is only
a factor of 2 different, which is far less severe than the alternative,
where you theoretically might need to decode anything up to an entire DPB's
worth of frames in the time it takes to display one unpaired field.)
Ben
In message <gmq0g1t9qvo51aqcdqo71rm47u7do0rh96 4ax.com>
          John Cox <jc sj.co.uk> wrote:
> Hi
> 
> It is certainly possible to write a decoder that emits one frame for
> every frame decoded (maybe with some exceptions if you have unpaired
> reference fields).  The only "problem" is that it has a latency that is
> the size of the DPB.  It is true that many frames may become eligible
> for display at once, but that is no reason to do so - they can be held
> in their DPB slot until another frame has been received - if you work
> though the bumping process you will find that you never need another
> DPB-sized frame store.  I recon you need a DPB sized store + 1 frame for
> the current decode + 1 frame being displayed, the last not being
> strictly necessary.
> 
> John Cox
> SJ Consulting Ltd
> 
> On Thu, 11 Aug 2023 15:43:03 +0100, you wrote:
> 
> >
> >Is there anywhere that defines in temporal terms the behaviour of the HRD
> >in the absence of picture timing SEI messages in the bitstream? I wonder
> >if this issue has been lost in the crack between the H.264 spec and the
> >MPEG-2 systems spec - it could have important consequences for
> >interoperatability of H.264 streams encapsulated in program streams or
> >transport streams.
> >
> >To elaborate: version 2 of the MPEG-2 systems spec defines DTS and PTS in
> >terms of parameters derived from picture timing SEI messages. This mechanism
> >allows the H.264 encoder to unambiguously inform the multiplexer of all the
> >information it needs to be able to schedule the bitstream within the
> >multiplex. However, this does not help when the H.264 stream does not
> >include picture timing SEI messages - and the majority of current H.264
> >encoders do not seem to do so.
> >
> >In the absence of picture timing SEI messages, the only constraints upon
> >H.264 bitstreams appear to be that they be decodable according to the
> >bumping process. But compared to traditional codecs, this process can be
> >"lumpy": there can be times when decode cannot proceed (for example when
> >a frame is output but it is still marked as used for reference, and the
> >DPB is full but all other frames in the DPB are either also marked as used
> >for reference, or have a higher picture order count). And there are times
> >when multiple frames need to be decoded between the output of two frames
> >that are consecutive in output order (for example when a frame that follows
> >an IDR frame in decode order precedes it in output order).
> >
> >I can see at least two ways that this "lumpiness" can be dealt with. One is
> >to assume that the decoder has about twice as many frame stores available as
> >is specified by the profile and level; this would allow decoding to proceed
> >when the DPB would otherwise have been full, and assuming that you had
> >reached the nominal DPB fullness level before starting output, should also
> >prevent the need ever to decode more than one frame during the output period
> >of one frame.
> >
> >The other approach is to accept that the decode frame rate will be lumpy.
> >But this leaves an unanswered question of how far apart the DTS values of
> >the pictures should be when multiple frames need to be decoded within the
> >output period of one frame.
> >
> >My gut feeling is that it would be nice to be able to assume the former
> >scenario, for the sake of smoothing out tha data rates, for evening out
> >the processing load on decoders, and to make the calcuation of DTS values
> >easier and less ambiguous. However, I suspect that this is unlikely to be
> >supported by the H.264 spec.
> >
> >The decision about which behaviour the HRD is assumed to have impacts very
> >much on the scheduling of the bitstream within a multiplex, because the
> >multiplexer has to ensure that the CPB neither overflows or underflows, and
> >that depends upon the time of removal of coded pictures from the CPB (which
> >is defined to be equivalent to the DTS). This is where the interoperability
> >issue I mentioned comes into play.
> >
> >Can anyone offer me any advice on this issue?
> >
> >Thanks,
> >Ben Avison
> 

-- 
Ben Avison
Tematic                                       Tel: +44 (0) 1728 727437
3 Signet Court                                Fax: +44 (0) 1728 727430
Cambridge, CB5 8LA, United Kingdom            WWW: http://www.tematic.com/


More information about the Mp4-tech mailing list