How to decode a video (memory file / byte string) and step through it frame by frame in python?

Question

I am using python to do some basic image processing, and want to extend it to process a video frame by frame.

I get the video as a blob from a server - .webm encoded - and have it in python as a byte string (b'\x1aE\xdf\xa3\xa3B\x86\x81\x01B\xf7\x81\x01B\xf2\x81\x04B\xf3\x81\x08B\x82\x88matroskaB\x87\x81\x04B\x85\x81\x02\x18S\x80g\x01\xff\xff\xff\xff\xff\xff\xff\x15I\xa9f\x99*\xd7\xb1\x83\x0fB@M\x80\x86ChromeWA\x86Chrome\x16T\xaek\xad\xae\xab\xd7\x81\x01s\xc5\x87\x04\xe8\xfc\x16\t^\x8c\x83\x81\x01\x86\x8fV_MPEG4/ISO/AVC\xe0\x88\xb0\x82\x02\x80\xba\x82\x01\xe0\x1fC\xb6u\x01\xff\xff\xff\xff\xff\xff ...).

I know that there is cv.VideoCapture, which can do almost what I need. The problem is that I would have to first write the file to disk, and then load it again. It seems much cleaner to wrap the string, e.g., into an IOStream, and feed it to some function that does the decoding.

Is there a clean way to do this in python, or is writing to disk and loading it again the way to go?

Rotem · Accepted Answer · 2020-07-03 08:23:27Z

According to this post, you can't use cv.VideoCapture for decoding in memory stream.
you may decode the stream by "piping" to FFmpeg.

The solution is a bit complicated, and writing to disk is much simpler, and probably cleaner solution.

I am posting a solution using FFmpeg (and FFprobe).
There are Python bindings for FFmpeg, but the solution is executing FFmpeg as an external application using subprocess module.
(The Python binding is working well with FFmpeg, but piping to FFprobe is not).
I am using Windows 10, and I put ffmpeg.exe and ffprobe.exe in the execution folder (you may set the execution path as well).
For Windows, download the latest (statically liked) stable version.

I created a standalone example that performs the following:

Generate synthetic video, and save it to WebM file (used as input for testing).
Read file into memory as binary data (replace it with your blob from the server).
Pipe the binary stream to FFprobe, for finding the video resolution.
In case the resolution is known from advance, you may skip this part.
Piping to FFprobe makes the solution more complicated than it should have.
Pipe the binary stream to FFmpeg stdin for decoding, and read decoded raw frames from stdout pipe.
Writing to stdin is done in chunks using Python thread.
(The reason for using stdin and stdout instead of named pipes is for Windows compatibility).

Piping architecture:

 -------------------- Encoded --------- Decoded ------------ | Input WebM encoded | data | ffmpeg | raw frames | reshape to | | stream (VP9 codec) | ----------> | process | ----------> | NumPy array| -------------------- stdin PIPE --------- stdout PIPE -------------

Here is the code:

import numpy as np import cv2 import io import subprocess as sp import threading import json from functools import partial import shlex # Build synthetic video and read binary data into memory (for testing): ######################################################################### width, height = 640, 480 sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size={}x{}:rate=1 -vcodec vp9 -crf 23 -t 50 test.webm'.format(width, height))) with open('test.webm', 'rb') as binary_file: in_bytes = binary_file.read() ######################################################################### # https://stackoverflow.com/questions/5911362/pipe-large-amount-of-data-to-stdin-while-using-subprocess-popen/14026178 # https://stackoverflow.com/questions/15599639/what-is-the-perfect-counterpart-in-python-for-while-not-eof # Write to stdin in chunks of 1024 bytes. def writer(): for chunk in iter(partial(stream.read, 1024), b''): process.stdin.write(chunk) try: process.stdin.close() except (BrokenPipeError): pass # For unknown reason there is a Broken Pipe Error when executing FFprobe. # Get resolution of video frames using FFprobe # (in case resolution is know, skip this part): ################################################################################ # Open In-memory binary streams stream = io.BytesIO(in_bytes) process = sp.Popen(shlex.split('ffprobe -v error -i pipe: -select_streams v -print_format json -show_streams'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8) pthread = threading.Thread(target=writer) pthread.start() pthread.join() in_bytes = process.stdout.read() process.wait() p = json.loads(in_bytes) width = (p['streams'][0])['width'] height = (p['streams'][0])['height'] ################################################################################ # Decoding the video using FFmpeg: ################################################################################ stream.seek(0) # FFmpeg input PIPE: WebM encoded data as stream of bytes. # FFmpeg output PIPE: decoded video frames in BGR format. process = sp.Popen(shlex.split('ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8) thread = threading.Thread(target=writer) thread.start() # Read decoded video (frame by frame), and display each frame (using cv2.imshow) while True: # Read raw video frame from stdout as bytes array. in_bytes = process.stdout.read(width * height * 3) if not in_bytes: break # Break loop if no more bytes. # Transform the byte read into a NumPy array in_frame = (np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3])) # Display the frame (for testing) cv2.imshow('in_frame', in_frame) if cv2.waitKey(100) & 0xFF == ord('q'): break if not in_bytes: # Wait for thread to end only if not exit loop by pressing 'q' thread.join() try: process.wait(1) except (sp.TimeoutExpired): process.kill() # In case 'q' is pressed. ################################################################################ cv2.destroyAllWindows()

Remark:

In case you are getting an error like "file not found: ffmpeg...", try using full path.
For example (in Linux): '/usr/bin/ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'

I have a related question with reading binary data from livestreams using streamlink. I have attempted to follow Your line of thought and use the provided code to read a stream like that but I was unable to do it.
I don't have any experience with streamlink. A major issue I can think about, is knowing frame width and height when reading binary data from livestreams.
It seems that I went a little overboard and instead of trying to directly pass the link to some API i attempted to first read its binary which was unnecessary. streamlink just provides a HSL broadcast information that can be read to a binary string - but can be passed as a source file URL too it seems.

FirefoxMetzger · Accepted Answer · 2022-02-14 08:15:12Z

Two years after Rotem wrote his answer there is now a cleaner / easier way to do this using ImageIO.

Note: Assuming ffmpeg is in your path, you can generate a test video to try this example using: ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 testsrc.webm

import imageio.v3 as iio from pathlib import Path webm_bytes = Path("testsrc.webm").read_bytes() # read all frames from the bytes string frames = iio.imread(webm_bytes, index=None, format_hint=".webm") frames.shape # Output: # (300, 720, 1280, 3) for frame in iio.imiter(webm_bytes, format_hint=".webm"): print(frame.shape) # Output: # (720, 1280, 3) # (720, 1280, 3) # (720, 1280, 3) # ...

To use this you'll need the ffmpeg backend (which implements a solution similar to what Rotem proposed): pip install imageio[ffmpeg]

In response to Rotem's comment a bit of explanation:

The above snippet uses imageio==2.16.0. The v3 API is an upcoming user-facing API that streamlines reading and writing. The API is available since imageio==2.10.0, however, you will have to use import imageio as iio and use iio.v3.imiter and iio.v3.imread on versions older than 2.16.0.

The ability to read video bytes has existed forever (>5 years and counting) but has (as I am just now realizing) never been documented directly ... so I will add a PR for that soon™ :)

On older versions (tested on v2.9.0) of ImageIO (v2 API) you can still read video byte strings; however, this is slightly more verbose:

import imageio as iio import numpy as np from pathlib import Path webm_bytes = Path("testsrc.webm").read_bytes() # read all frames from the bytes string frames = np.stack(iio.mimread(webm_bytes, format="FFMPEG", memtest=False)) # iterate over frames one by one reader = iio.get_reader(webm_bytes, format="FFMPEG") for frame in reader: print(frame.shape) reader.close()

Nine! The solution is very elegant. Can you please add few words about imageio.v3 interface?
Just add short description to your answer. Is it a new feature? What version of imageio do we need? Is it related to a future version (version 3)?

X. Wang · Accepted Answer · 2022-12-05 16:13:29Z

There is a pythonic way to do this by using decord package.

import io from decord import VideoReader # This is the bytes object of your video. video_str # Load video file_obj = io.BytesIO(video_str) container = decord.VideoReader(file_obj) # Get the total number of video frames len(container) # Access the NDarray of the (i+1)-th frame container[i]

You can learn more about decord in decord github repo.

You can learn more about video IO in mmaction repo. See DecordInit for using decord IO.

Collectives™ on Stack Overflow

How to decode a video (memory file / byte string) and step through it frame by frame in python?

3 Answers 3

3 Comments

5 Comments

Comments

Linked

Hot Network Questions