[Mp4-tech] [H.264] [Systems] Picture timing in absence of SEI messages

Mon Aug 15 16:28:28 ESTEDT 2005

Hi
It is certainly possible to write a decoder that emits one frame for
every frame decoded (maybe with some exceptions if you have unpaired
reference fields).  The only "problem" is that it has a latency that is
the size of the DPB.  It is true that many frames may become eligible
for display at once, but that is no reason to do so - they can be held
in their DPB slot until another frame has been received - if you work
though the bumping process you will find that you never need another
DPB-sized frame store.  I recon you need a DPB sized store + 1 frame for
the current decode + 1 frame being displayed, the last not being
strictly necessary.
John Cox
SJ Consulting Ltd
On Thu, 11 Aug 2023 15:43:03 +0100, you wrote:
>
>Is there anywhere that defines in temporal terms the behaviour of the HRD
>in the absence of picture timing SEI messages in the bitstream? I wonder
>if this issue has been lost in the crack between the H.264 spec and the
>MPEG-2 systems spec - it could have important consequences for
>interoperatability of H.264 streams encapsulated in program streams or
>transport streams.
>
>To elaborate: version 2 of the MPEG-2 systems spec defines DTS and PTS in
>terms of parameters derived from picture timing SEI messages. This mechanism
>allows the H.264 encoder to unambiguously inform the multiplexer of all the
>information it needs to be able to schedule the bitstream within the
>multiplex. However, this does not help when the H.264 stream does not
>include picture timing SEI messages - and the majority of current H.264
>encoders do not seem to do so.
>
>In the absence of picture timing SEI messages, the only constraints upon
>H.264 bitstreams appear to be that they be decodable according to the
>bumping process. But compared to traditional codecs, this process can be
>"lumpy": there can be times when decode cannot proceed (for example when
>a frame is output but it is still marked as used for reference, and the
>DPB is full but all other frames in the DPB are either also marked as used
>for reference, or have a higher picture order count). And there are times
>when multiple frames need to be decoded between the output of two frames
>that are consecutive in output order (for example when a frame that follows
>an IDR frame in decode order precedes it in output order).
>
>I can see at least two ways that this "lumpiness" can be dealt with. One is
>to assume that the decoder has about twice as many frame stores available as
>is specified by the profile and level; this would allow decoding to proceed
>when the DPB would otherwise have been full, and assuming that you had
>reached the nominal DPB fullness level before starting output, should also
>prevent the need ever to decode more than one frame during the output period
>of one frame.
>
>The other approach is to accept that the decode frame rate will be lumpy.
>But this leaves an unanswered question of how far apart the DTS values of
>the pictures should be when multiple frames need to be decoded within the
>output period of one frame.
>
>My gut feeling is that it would be nice to be able to assume the former
>scenario, for the sake of smoothing out tha data rates, for evening out
>the processing load on decoders, and to make the calcuation of DTS values
>easier and less ambiguous. However, I suspect that this is unlikely to be
>supported by the H.264 spec.
>
>The decision about which behaviour the HRD is assumed to have impacts very
>much on the scheduling of the bitstream within a multiplex, because the
>multiplexer has to ensure that the CPB neither overflows or underflows, and
>that depends upon the time of removal of coded pictures from the CPB (which
>is defined to be equivalent to the DTS). This is where the interoperability
>issue I mentioned comes into play.
>
>Can anyone offer me any advice on this issue?
>
>Thanks,
>Ben Avison