[Mp4-tech] [H.264] [Systems] Picture timing in absence of SEI messages

John Cox jc sj.co.uk
Tue Aug 16 11:26:03 ESTEDT 2005


Hi
You are of course correct that it is a pig to reconstruct stream timing
without SEI timing messages - indeed a raw ES need not contain enough
info to know at what rate pictures should be displayed so the thing may
be impossible.  I think if you want to build a good mux then you will
just have to insist that the incoming stream contains useful info!
John Cox
SJ Consulting
On Mon, 15 Aug 2023 19:54:53 +0100, you wrote:
>Thanks, John. I suspected that it was possible to construct such a decoder
>(after all, in the long term, the number of frames decoded and output has to
>even out, if we discount the no_output_of_prior_pics flag).
>
>However, this wasn't really the problem I was outlining. It's best
>considered from the viewpoint of the author of a multiplexer program:
>suppose you are presented with a H.264 bitstream with no picture timing SEI
>messages, and it's your job to ensure that the CPB neither overflows nor
>underflows. You need a model of an H.264 decoder to target - do you assume
>a decoder that emits one frame for each frame decoded and expect that any
>decoders that don't will compensate for the differences, or vice versa?
>
>If you assume a decoder that can emit multiple frames between decoding two
>consecutive ones and decode multiple frames between emitting consecutive
>ones, then there is a second problem of defining when pictures are removed
>from the CPB at those points in the bitstream where multiple frames are
>decoded in the time it takes to emit a single one. Perhaps a sensible
>algorithm would be for the last picture to be removed from the CPB at an
>time before the end of the output period of the current frame which is
>derived from the size of the coded picture in bits and MaxBR for the
>profile level, and the previous picture at an equivalent time before that,
>and so on. But this needs to be explicitly standardised to ensure
>interoperability between arbitrary multiplexers and decoders. It should even
>go to the lengths of describing how rounding, if any, should be applied to
>the above calculation.
>
>The beauty of assuming that decode and display occur in lock step is that
>it's really simple to calculate CPB removal times (aka the DTS), because
>they are tied to the field_pic_flag and pic_struct of the frame that is
>being output at the same time.
>
>(By the way, I'm not sure where the problem with unpaired reference fields
>lies: surely it's just a case that at worst a complete frame buffer
>consisting of either a frame picture or a complementary field pair needs to
>be decoded in the time it takes to display an unpaired field. This is only
>a factor of 2 different, which is far less severe than the alternative,
>where you theoretically might need to decode anything up to an entire DPB's
>worth of frames in the time it takes to display one unpaired field.)
>
>Ben



More information about the Mp4-tech mailing list