How does the h.264 encoder determine where to put i-frames?

Question

I'm reading a bit on I-frames and their counterparts, P-frames and B-frames. I understand the usage. I-frames are the key to the compression and following P and B-frames can use the data in I-frames to save on bits.

However, what I would find interesting is knowing how the encoder determines which frames should be I-frames and which should be P's and B's and pull their data from them. How does the encoder decide which frames will be I-frames?

Side thoughts (you don't have to answer, but these are interesting and useful to me in the future): Are I-frames easier, as in faster, to find and extract than say extracting every 30th frame? If a video is a slide show with audio and no slide animations, are I-frames likely to coincide with slide changes? Would compression be better on a video made from slides images rather than the same images taken in via stream in real time?

Should I have used the keyframes tag? Should there be an i-frames tag or should it be a synonym? — user3643
– user3643, Commented Jan 23, 2017 at 10:20
In addition to @Mulvya's fine answer, there is the case of older codecs and encoders that place I-frames at fixed intervals regardless of need. This was the era of the 'fixed-length GOP'. — Jim Mack
– Jim Mack, Commented Jan 23, 2017 at 15:43
'keyframe' is used in After Effects and other animation apps to describe a point on a timeline where a value is set. 'I-Frame' is more specific — stib
– stib, Commented Jan 23, 2017 at 22:26
The same term is used within ffmpeg and x264 code, and I suspect, most/all other codecs. — Gyan
– Gyan, Commented Jan 24, 2017 at 3:56
@user3643 An I-frame is just a frame that does not reference any other frames (all blocks are intra-coded). It is not necessarily a keyframe, which often refers explicitly to an IDR-frame (an I-frame at a boundary where no subsequent frames reference frames before it). For example, ffprobe will show I-frames that it does not consider keyframes if it is looking at an open GOP. — forest
– forest, Commented yesterday

Gyan · Accepted Answer · 2017-01-23 11:24:34Z

This is a complex topic, with the exact algorithm unique to each encoder.

Below is a pseudocode explanation from a x264 developer. B-frames aren't accounted for, but basic logic should be similar.

encode current frame as (a really fast approximation of) a P-frame and an I-frame. if ((distance from previous keyframe) > keyint) then set IDR-frame else if (1 - (bit size of P-frame) / (bit size of I-frame) < (scenecut / 100) * (distance from previous keyframe) / keyint) then if ((distance from previous keyframe) >= minkeyint) then set IDR-frame else set I-frame else set P-frame encode frame for real.

scenecut is the scene change threshold value. 0 means current frame is identical to previous frame, and 100 means it is completely different.

keyint is the maximum permitted distance between two keyframes; minkeyint is the minimum.

IDR (instantaneous decoder refresh) frames are keyframes such that no future frame requires to refer to a frame earlier than the IDR-frame for decoding. Not necessarily true for plain I-frames.

As a side-note, every frame will be an IDR-frame in an encode produced by x264 in nearly all cases because x264 uses a closed GOP by default (this can be changed with -x264-params open-gop=1). — forest
– forest, Commented yesterday

forest · Accepted Answer · 2025-11-22 19:04:16Z

Side thoughts (you don't have to answer, but these are interesting and useful to me in the future): Are I-frames easier, as in faster, to find and extract than say extracting every 30th frame?

I-frames are easier and faster to find and extract because you only need one I-frame to decode a picture. If you are seeking to a P-frame or B-frame, you have to seek backwards to the previous I-frame and then decode all frames until you hit your target. This is why encodes with a large GOP (group of pictures, aka the number of I-frames followed by consecutive non-I-frames) seek more slowly, especially if you happen to seek towards the end of a large GOP.

The x264 encoder has a parameter called scenecut to tweak how sensitive it is to scene changes.

If a video is a slide show with audio and no slide animations, are I-frames likely to coincide with slide changes?

In general, yes, depending on the minimum keyframe interval. As I frames are intended to be high-quality reference frames, most encoders will attempt to put I frames on scene change boundaries. If scene changes are happening too frequently, however, it is not necessarily going to spam the encode with I-frames. But if an I-frame is due to be inserted soon and the encoder runs into a scene change, it will prefer to insert it a bit early if that means it can be the first frame of a new scene.

Stack Exchange Network

How does the h.264 encoder determine where to put i-frames?

2 Answers 2

Hot Network Questions

How does the h.264 encoder determine where to put i-frames?

2 Answers 2

Related

Hot Network Questions