Skip to content

Flushing the output queue WITHOUT invalidating the decode pipeline #698

@RogerSanders

Description

@RogerSanders

First up, let me say thanks to everyone who's worked on this very important spec. I appreciate the time and effort that's gone into this, and I've found it absolutely invaluable in my work. That said, I have hit a snag, which I consider to be a significant issue in the standard.

Some background first so you can understand what I'm trying to do.

  • I'm doing 3D rendering on a cloud server in realtime, using hardware H265 encoding to compress frames, and sending those frames down over a websocket to the client browser, where I use the WebCodecs api (specifically VideoDecoder) to decode those frames and display them to the user.
  • My goal is low latency, which is done by balancing rendering time, encoding time, compressed frame size, and decoding time.
  • My application is an event driven, CAD-style application. Unlike video encoding from a real camera, or 3D games, where a constant or target framerate exists, my application has no such concept. The last image generated remains valid for an indefinite period of time. The next frame I send over the wire may be displayed for milliseconds, or for an hour, it all depends on what state the program is in and what the user does next in terms of input.

So then, based on the above, hopefully my requirements become clearer. When I get a video frame at the client side, I need to ensure that frame gets displayed to the user, without any subsequent frame being required to arrive in order to "flush" it out of the pipeline. My frames come down at an irregular and unpredictable rate. If the pipeline has a frame or two of latency, I need to be able to manually flush that pipeline on demand, to ensure that all frames which have been sent have been sent to the codec for decoding have been made visible.

At this point, I find myself fighting a few parts of the WebCodecs spec. As stated in the spec, when I call the decode method (https://www.w3.org/TR/webcodecs/#dom-videodecoder-decode), it "enqueues a control message to decode the given chunk". That decode request ends up on the "codec work queue" (https://www.w3.org/TR/webcodecs/#dom-videodecoder-codec-work-queue-slot), and ultimately stuck in the "internal pending output" queue (https://www.w3.org/TR/webcodecs/#internal-pending-output). As stated in the spec, "Codec outputs such as VideoFrames that currently reside in the internal pipeline of the underlying codec implementation. The underlying codec implementation may emit new outputs only when new inputs are provided. The underlying codec implementation must emit all outputs in response to a flush." So frames may be held up in this queue, as per the spec. As also stated however, in response to a flush, all pending outputs MUST be emitted in response to a flush. There is, of course, an explicit flush method: https://www.w3.org/TR/webcodecs/#dom-videodecoder-flush
The spec states that this method "Completes all control messages in the control message queue and emits all outputs". Fantastic, that's what I need. Unfortunately, the spec also specifically states that in response to a flush call, the implementation MUST "Set [[key chunk required]] to true." This means that after a flush, I can only provide a key frame. Not so good. In my scenario, without knowing when, or if, a subsequent frame is going to arrive, I end up having to flush after every frame, and now due to this requirement that a key frame must follow a flush, every frame must be a keyframe. This increases my payload size significantly, and can cause noticeable delays and stuttering on slower connections.

When I use a dedicated desktop client, and have full control of the decoding hardware, I can perform a "flush" without invalidating the pipeline state, so I can, for example, process a sequence of five frames such as "IPPPP", flushing the pipeline after each one, and this works without issue. I'd like to be able to achieve the same thing under the WebCodecs API. Currently, this seems impossible, as following a call to flush, the subsequent frame must be an I frame, not a P frame.

My question now is, how can this be overcome? It seems to me, I'd need one of two things:

  1. The ability to guarantee no frames will be held up in the internal pending output queue waiting for another input, OR:
  2. The ability to flush the decoder, or the codec internal pending output queue, without invalidating the decoder pipeline.

At the hardware encoding/decoding API level, my only experience is with NVENC/NVDEC, but I know what I want is possible under this implementation at least. Are there known hardware implementations where what I'm asking for isn't possible? Can anyone see a possible way around this situation?

I can tell you right now, I have two workarounds. One is to encode every frame as a keyframe. This is clearly sub-optimal for bandwidth, and not required for a standalone client. The second workaround is ugly, but functions. I can measure the "queue depth" of the output queue, and send the same frame for decoding multiple times. This works with I or P frames. With a queue depth of 1 for example, which is what I see on Google Chrome, for each frame I receive at the client end, I send it for decoding twice. The second copy of the frame "pushes" the first one out of the pipeline. A hack, for sure, and sub-optimal use of the decoding hardware, but it keeps my bandwidth at the expected level, and I'm able to implement it on the client side alone.

What I would like to see, ideally, is some extra control in the WebCodecs API. Perhaps a boolean flag in the codec configuration? We currently have the "optimizeForLatency" flag. I'd like to see a "forceImmediateOutput" flag or the like, which guarantees that every frame that is sent for decoding WILL be passed to the VideoFrameOutputCallback without the need to flush or force it through the pipeline with extra input. Failing that, an alternate method of flushing that doesn't invalidate the decode pipeline would work. Without either of these solutions though, it seems to me that WebCodecs as it stands is unsuitable for use with variable rate streams, as you have no guarantees about the depth of the internal pending output queue, and no way to flush it without breaking the stream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions