2

I'm reading a bit on I-frames and their counterparts, P-frames and B-frames. I understand the usage. I-frames are the key to the compression and following P and B-frames can use the data in I-frames to save on bits.

However, what I would find interesting is knowing how the encoder determines which frames should be I-frames and which should be P's and B's and pull their data from them. How does the encoder decide which frames will be I-frames?


Side thoughts (you don't have to answer, but these are interesting and useful to me in the future): Are I-frames easier, as in faster, to find and extract than say extracting every 30th frame? If a video is a slide show with audio and no slide animations, are I-frames likely to coincide with slide changes? Would compression be better on a video made from slides images rather than the same images taken in via stream in real time?

5
  • Should I have used the keyframes tag? Should there be an i-frames tag or should it be a synonym? Commented Jan 23, 2017 at 10:20
  • In addition to @Mulvya's fine answer, there is the case of older codecs and encoders that place I-frames at fixed intervals regardless of need. This was the era of the 'fixed-length GOP'. Commented Jan 23, 2017 at 15:43
  • 'keyframe' is used in After Effects and other animation apps to describe a point on a timeline where a value is set. 'I-Frame' is more specific Commented Jan 23, 2017 at 22:26
  • The same term is used within ffmpeg and x264 code, and I suspect, most/all other codecs. Commented Jan 24, 2017 at 3:56
  • @user3643 An I-frame is just a frame that does not reference any other frames (all blocks are intra-coded). It is not necessarily a keyframe, which often refers explicitly to an IDR-frame (an I-frame at a boundary where no subsequent frames reference frames before it). For example, ffprobe will show I-frames that it does not consider keyframes if it is looking at an open GOP. Commented yesterday

2 Answers 2

4

This is a complex topic, with the exact algorithm unique to each encoder.

Below is a pseudocode explanation from a x264 developer. B-frames aren't accounted for, but basic logic should be similar.

encode current frame as (a really fast approximation of) a P-frame and an I-frame. if ((distance from previous keyframe) > keyint) then set IDR-frame else if (1 - (bit size of P-frame) / (bit size of I-frame) < (scenecut / 100) * (distance from previous keyframe) / keyint) then if ((distance from previous keyframe) >= minkeyint) then set IDR-frame else set I-frame else set P-frame encode frame for real. 

scenecut is the scene change threshold value. 0 means current frame is identical to previous frame, and 100 means it is completely different.

keyint is the maximum permitted distance between two keyframes; minkeyint is the minimum.

IDR (instantaneous decoder refresh) frames are keyframes such that no future frame requires to refer to a frame earlier than the IDR-frame for decoding. Not necessarily true for plain I-frames.

1
  • As a side-note, every frame will be an IDR-frame in an encode produced by x264 in nearly all cases because x264 uses a closed GOP by default (this can be changed with -x264-params open-gop=1). Commented yesterday
0

Side thoughts (you don't have to answer, but these are interesting and useful to me in the future): Are I-frames easier, as in faster, to find and extract than say extracting every 30th frame?

I-frames are easier and faster to find and extract because you only need one I-frame to decode a picture. If you are seeking to a P-frame or B-frame, you have to seek backwards to the previous I-frame and then decode all frames until you hit your target. This is why encodes with a large GOP (group of pictures, aka the number of I-frames followed by consecutive non-I-frames) seek more slowly, especially if you happen to seek towards the end of a large GOP.

The x264 encoder has a parameter called scenecut to tweak how sensitive it is to scene changes.

If a video is a slide show with audio and no slide animations, are I-frames likely to coincide with slide changes?

In general, yes, depending on the minimum keyframe interval. As I frames are intended to be high-quality reference frames, most encoders will attempt to put I frames on scene change boundaries. If scene changes are happening too frequently, however, it is not necessarily going to spam the encode with I-frames. But if an I-frame is due to be inserted soon and the encoder runs into a scene change, it will prefer to insert it a bit early if that means it can be the first frame of a new scene.

New contributor
forest is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.