I assume you a list of the filenames called movie_images
# Could get filenames with: # import os; movie_images = os.listdir("./folder/with/images/") movie_filenames = ["11_lion_king.jpg", "22_avengers.jpg"]
First create a mapping between the ID values and the filenames:
# Use the "_" to split the filename and take the first items, the ID mapping = {f.split("_")[0]: f for f in movie_filenames} # <-- a dictionary-comprehension
Now add a column of some empty values (whatever you like) that will hold the movie_image values:
data_movie["movie_image"] = pd.Series() # will be filled with NaN values until populated
Now iterate over this mapping, inserting the movie filenames for the correct movie IDs:
for movie_id, movie_image_filename in mapping.items(): df.loc[df.movie_id == movie_id, "movie_image"] = movie_image_filename
This should produce the output dataframe you described.
As a side note (in case you are ever tempted): never load the actual images into a pandas dataframe. It is best to load them as NumPy arrays or something similar. Pandas DataFrames are in essence just annotated NumPy arrays anyway.