0
$\begingroup$

I have jpg images stored in a folder. For ex: 11_lion_king.jpg,22_avengers.jpg etc.

I have a data frame as below:

data_movie.head() movie_id genre 11 ['action','comedy] 22 ['animation',comedy] .......... 

I want to add a new column movie_image into the data_movie data frame with the jpg information mapped correctly with movie_id column as shown below:

movie_id genre movie_image 11 ['action','comedy] 11_lion_king.jpg 22 ['animation',comedy] 22_avengers.jpg ......... 

Help will be appreciated.

$\endgroup$

2 Answers 2

2
$\begingroup$

I assume you a list of the filenames called movie_images

# Could get filenames with: # import os; movie_images = os.listdir("./folder/with/images/") movie_filenames = ["11_lion_king.jpg", "22_avengers.jpg"] 

First create a mapping between the ID values and the filenames:

# Use the "_" to split the filename and take the first items, the ID mapping = {f.split("_")[0]: f for f in movie_filenames} # <-- a dictionary-comprehension 

Now add a column of some empty values (whatever you like) that will hold the movie_image values:

data_movie["movie_image"] = pd.Series() # will be filled with NaN values until populated 

Now iterate over this mapping, inserting the movie filenames for the correct movie IDs:

for movie_id, movie_image_filename in mapping.items(): df.loc[df.movie_id == movie_id, "movie_image"] = movie_image_filename 

This should produce the output dataframe you described.

As a side note (in case you are ever tempted): never load the actual images into a pandas dataframe. It is best to load them as NumPy arrays or something similar. Pandas DataFrames are in essence just annotated NumPy arrays anyway.

$\endgroup$
1
$\begingroup$

Slight addendum to the above solution:

##First create a mapping between the ID values and the filenames: # Use the "_" to split the filename and take the first items, the ID mapping = {f.split("_")[0]: f for f in movie_filenames} # <-- a dictionary-comprehension ##Now iterate over this mapping, inserting the movie filenames for the correct movie IDs: for movie_id, movie_image_filename in mapping.items(): data_movie.loc[data_movie.movie_id.astype(str) == movie_id, "movie_image"] = movie_filenames 

Aliter way usingmap function:

mapping = {f.split("_")[0]: f for f in movie_filenames} data_movie["movie_image"] = data_movie['movie_id'].astype(str).map(mapping) 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.