Revisions to How a standard video file is structured under the hood

edited body

edited Aug 19, 2018 at 18:08

1.2k
6
7

Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one after the other in a single chunk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small chunk of video followed by a small chunk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be a good enough starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska mixingmuxing application MKVMerge.

Btw: “Muxing” is short for “multiplexing”. The long form is rarely used, though.

Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one after the other in a single chunk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small chunk of video followed by a small chunk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be a good enough starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska mixing application MKVMerge.

Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one after the other in a single chunk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small chunk of video followed by a small chunk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be a good enough starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska muxing application MKVMerge.

Btw: “Muxing” is short for “multiplexing”. The long form is rarely used, though.

copy edited

Source Link

edited Aug 19, 2018 at 14:05

Bart van Ingen Schenau

79k
20
131
197

Parsing a complete digialdigital film is an immenslyimmensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one ofterafter the other in a single junkchunk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small junkchunk of video followed by a small junkchunk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be an oka good enough starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska muxingmixing application MKVMerge.

Parsing a complete digial film is an immensly complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one ofter the other in a single junk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small junk of video followed by a small junk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be an ok starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska muxing application MKVMerge.

Parsing a complete digital film is an immensely complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one after the other in a single chunk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small chunk of video followed by a small chunk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be a good enough starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska mixing application MKVMerge.

Source Link

answered Aug 19, 2018 at 8:00

besc

1.2k
6
7

Parsing a complete digial film is an immensly complex task. Because you mostly ask about WebM – a container format – I’ll concentrate on that.

You always start with individual streams containing the payload data: video (e.g. H.264, VP9), audio (e.g. AAC, Opus) and subtitles (e.g. SubRip, Blu-ray PGS). Tied to those streams is some metadata needed for correct playback. For example the streams need to be synchronized properly.

As a simple example imagine a WebM file containing a VP9 video stream and an Opus audio stream.

The WebM container acts as a wrapper for the VP9 and Opus streams that makes it possible to put them into a single file and still access them conveniently. Also it contains additional data like the types of streams it contains or checksums for error recovery.

Naively you could store the streams one ofter the other in a single junk each. Obviously that’s a horrible solution for streaming because you’d have to buffer the complete file before playback can start. That’s one reason why streams are interleaved. The file stores a small junk of video followed by a small junk of audio (maybe half a second each) and repeats that pattern throughout the file.

What do you need to parse such a file?

A WebM parser to process the container and extract the payload streams.
A VP9 parser (probably as a part of a full VP9 decoder) to process the video stream.
An Opus parser (again probably as a part of a full Opus decoder) to process the audio stream.

WebM is a subset of Matroska. You can get a full specification of the format on the Matroska website. The parser you link to seems extremely simplistic on first glance, but it might be an ok starting point. For a complete implementation you should have a closer look at the reference parser: libmatroska. It’s used for example in the de-facto standard Matroska muxing application MKVMerge.

Stack Exchange Network

Return to Answer