how do i split multiple rows in different column in pandas [duplicate]

Question

In below example i have data of movies :

I want to split the title column values into 2 new column i.e one of the new column ( " movie title ") will take = toy story as row value and another new column ("year) will take =1995 as row value.

how to perform this operation on whole dataframe?

 title \ 0 Toy Story (1995) 1 Jumanji (1995) 2 Grumpier Old Men (1995) 3 Waiting to Exhale (1995) 4 Father of the Bride Part II (1995)

Scott Boston · Accepted Answer · 2018-08-15 20:55:21Z

Try to use str accessor with split using a regex that looks for a space and open paranthesis or close paranthesis:

df.title.str.split('\s\(|\)',expand=True) .drop(2, axis=1) .set_axis(['movies','year'], inplace=False, axis=1)

Output:

 movies year 0 Toy Story 1995 1 Jumanji 1995 2 Grumpier Old Men 1995 3 Waiting to Exhale 1995 4 Father of the Bride Part II 1995

rafaelc · Accepted Answer · 2018-08-15 21:00:12Z

If you don't want a regex solution,

df.title.str.rstrip(')').str.split('(').apply(pd.Series) 0 1 0 Toy Story 1995 1 Jumanji 1995 2 Grumpier Old Men 1995 3 Waiting to Exhale 1995 4 Father of the Bride Part II 1995

ALollz · Accepted Answer · 2018-08-16 14:04:07Z

Using regex:

df[['movie_title', 'year']] = df.title.str.extract('(.*)\s\((\d+)', expand=True)

Sample Data:

df = pd.DataFrame({'title': ['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)', 'Waiting to Exhale (1995)', 'Father of the Bride Part II (1995)', 'Hello (Goodbye) (1995)' ]}) df[['movie_title', 'year']] = df.title.str.extract('(.*)\s\((\d+)', expand=True)

Output:

 title movie_title year 0 Toy Story (1995) Toy Story 1995 1 Jumanji (1995) Jumanji 1995 2 Grumpier Old Men (1995) Grumpier Old Men 1995 3 Waiting to Exhale (1995) Waiting to Exhale 1995 4 Father of the Bride Part II (1995) Father of the Bride Part II 1995 5 Hello (Goodbye) (1995) Hello (Goodbye) 1995

The regular expression we use is: '(.*)\s\((\d+)'.

The first part of the expression is: (.*). The parenthesis indicate it's a capturing group, and what's inside the parenthesis indicates what we want to capture. In this case .* indicates that we greedily want to capture everything. After the capturing group, we have \s\( which is literally interpreted as a space folllwed by an open parenthesis, so ' ('. Because the capturing group before this is greedy, we will capture everything up until the final ' (' if there are multiple such matches. (for instance see how it captures Hello (Goodbye) (1995) properly.

Finally, we add a second capturing group with \d+ as the thing we want to capture, which captures all of the numbers after the final ' (' in your title, which for us will be the 4 digit year.

awesome thanks for the solution , if you don't mind could please explain me regex as i am new to python and also where i will get best practical documentation for regex.. thanks in advance
@VishalSuryavanshi see the edit for some explanation. Otherwise, Regex101 is a great place to learn and test regex.

rahlf23 · Accepted Answer · 2018-08-15 20:51:55Z

You can simply create a new dataframe (df_new) that splits the string in the title column of the old dataframe (df) and pass that as a list to the new dataframe constructor:

df_new = pd.DataFrame(df['title'].str.split(r' (?=\()').tolist(), columns=['movie title','year'])

Yields:

 movie title year 0 Toy Story (1995) 1 Jumanji (1995) 2 Grumpier Old Men (1995) 3 Waiting to Exhale (1995) 4 Father of the Bride Part II (1995)

Rushabh Mehta · Accepted Answer · 2018-08-15 21:00:21Z

df.apply(lambda x: x.str.split('('),axis=1) df2 = pd.DataFrame({'movie title':df['title'].apply(lambda x: x[0][:-1]), 'year':df['title'].apply(lambda x: x[1][:-1])}) print(df2)

Result

 movie title year 0 Toy Story 1995 1 Jumanji 1995 2 Grumpier Old Men 1995 3 Waiting to Exhale 1995 4 Father of the Bride Part II 1995

Collectives™ on Stack Overflow

how do i split multiple rows in different column in pandas [duplicate]

5 Answers 5

Comments

Comments

Sample Data:

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Sample Data:

3 Comments

Comments

Comments

Linked

Related