205

It's easy to turn a list of lists into a pandas dataframe:

import pandas as pd df = pd.DataFrame([[1,2,3],[3,4,5]]) 

But how do I turn df back into a list of lists?

lol = df.what_to_do_now? print lol # [[1,2,3],[3,4,5]] 
1
  • 1
    pd.DataFrame.what_to_do_now = lambda self: self.values.tolist(); lol = df.what_to_do_now(); print(lol) # [[1,2,3],[3,4,5]] it works if you can believe it. Commented Dec 27, 2020 at 2:17

14 Answers 14

298

You could access the underlying array and call its tolist method:

>>> df = pd.DataFrame([[1,2,3],[3,4,5]]) >>> lol = df.values.tolist() >>> lol [[1L, 2L, 3L], [3L, 4L, 5L]] 
Sign up to request clarification or add additional context in comments.

8 Comments

L mean long, as opposed to int.
NOTE, this does not preserve the column ordering. so watch out for that
There is no reason why it would not preserve the column ordering.
@RussellLego That seems a bit odd, do you happen to know of an example which could demonstrate that?
|
29

If the data has column and index labels that you want to preserve, there are a few options.

Example data:

>>> df = pd.DataFrame([[1,2,3],[3,4,5]], \ columns=('first', 'second', 'third'), \ index=('alpha', 'beta')) >>> df first second third alpha 1 2 3 beta 3 4 5 

The tolist() method described in other answers is useful but yields only the core data - which may not be enough, depending on your needs.

>>> df.values.tolist() [[1, 2, 3], [3, 4, 5]] 

One approach is to convert the DataFrame to json using df.to_json() and then parse it again. This is cumbersome but does have some advantages, because the to_json() method has some useful options.

>>> df.to_json() { "first":{"alpha":1,"beta":3}, "second":{"alpha":2,"beta":4},"third":{"alpha":3,"beta":5} } >>> df.to_json(orient='split') { "columns":["first","second","third"], "index":["alpha","beta"], "data":[[1,2,3],[3,4,5]] } 

Cumbersome but may be useful.

The good news is that it's pretty straightforward to build lists for the columns and rows:

>>> columns = [df.index.name] + [i for i in df.columns] >>> rows = [[i for i in row] for row in df.itertuples()] 

This yields:

>>> print(f"columns: {columns}\nrows: {rows}") columns: [None, 'first', 'second', 'third'] rows: [['alpha', 1, 2, 3], ['beta', 3, 4, 5]] 

If the None as the name of the index is bothersome, rename it:

df = df.rename_axis('stage') 

Then:

>>> columns = [df.index.name] + [i for i in df.columns] >>> print(f"columns: {columns}\nrows: {rows}") columns: ['stage', 'first', 'second', 'third'] rows: [['alpha', 1, 2, 3], ['beta', 3, 4, 5]] 

4 Comments

If you have a multilevel index, the index tuple will be the first element of the generated rows. You'll need a further step to split it.
Wouldn't it be simpler to use DataFrame.itertuples() or DataFrame.to_records() for all this?
@AMC Perhaps, I don't know, maybe? Rather than pontificate, why not add a proper treatment of that thought in your own answer?
@AndrewE Eh, it's still worth discussing and improving upon existing answers.
10

I wanted to preserve the index, so I adapted the original answer to this solution:

list_df = df.reset_index().values.tolist() 

Now you can paste it somewhere else (e.g. to paste into a Stack Overflow question) and latter recreate it:

pd.Dataframe(list_df, columns=['name1', ...]) pd.set_index(['name1'], inplace=True) 

Comments

7

I don't know if it will fit your needs, but you can also do:

>>> lol = df.values >>> lol array([[1, 2, 3], [3, 4, 5]]) 

This is just a numpy array from the ndarray module, which lets you do all the usual numpy array things.

1 Comment

Plus 1. In practice, there's often no need to convert the NumPy array into a list of lists.
6

I had this problem: how do I get the headers of the df to be in row 0 for writing them to row 1 in the excel (using xlsxwriter)? None of the proposed solutions worked, but they pointed me in the right direction. I just needed one line more of code

# get csv data df = pd.read_csv(filename) # combine column headers and list of lists of values lol = [df.columns.tolist()] + df.values.tolist() 

Comments

2

Maybe something changed but this gave back a list of ndarrays which did what I needed.

list(df.values) 

Comments

2
  1. The solutions presented so far suffer from a "reinventing the wheel" approach. Quoting @AMC:

If you're new to the library, consider double-checking whether the functionality you need is already offered by those Pandas objects.

  1. If you convert a dataframe to a list of lists you will lose information - namely the index and columns names.

My solution: use to_dict()

dict_of_lists = df.to_dict(orient='split') 

This will give you a dictionary with three lists: index, columns, data. If you decide you really don't need the columns and index names, you get the data with

dict_of_lists['data'] 

1 Comment

The solution presented above is still "lossy". You will lose the name of the index and columns (df.index.name and df.columns.name)
2

Not quite relate to the issue but another flavor with same expectation

converting data frame series into list of lists to plot the chart using create_distplot in Plotly

 hist_data=[] hist_data.append(map_data['Population'].to_numpy().tolist()) 

Comments

1

"df.values" returns a numpy array. This does not preserve the data types. An integer might be converted to a float.

df.iterrows() returns a series which also does not guarantee to preserve the data types. See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html

The code below converts to a list of list and preserves the data types:

rows = [list(row) for row in df.itertuples()] 

Comments

1

If you wish to convert a Pandas DataFrame to a table (list of lists) and include the header column this should work:

import pandas as pd def dfToTable(df:pd.DataFrame) -> list: return [list(df.columns)] + df.values.tolist() 

Usage (in REPL):

>>> df = pd.DataFrame( [["r1c1","r1c2","r1c3"],["r2c1","r2c2","r3c3"]] , columns=["c1", "c2", "c3"]) >>> df c1 c2 c3 0 r1c1 r1c2 r1c3 1 r2c1 r2c2 r3c3 >>> dfToTable(df) [['c1', 'c2', 'c3'], ['r1c1', 'r1c2', 'r1c3'], ['r2c1', 'r2c2', 'r3c3']] 

Comments

0

This is very simple:

import numpy as np list_of_lists = np.array(df) 

1 Comment

How is this different from using DataFrame.values or DataFrame.to_numpy() ? Never mind the fact that it creates a NumPy array, not a plain Python list.
0

A function I wrote that allows including the index column or the header row:

def df_to_list_of_lists(df, index=False, header=False): rows = [] if header: rows.append(([df.index.name] if index else []) + [e for e in df.columns]) for row in df.itertuples(): rows.append([e for e in row] if index else [e for e in row][1:]) return rows 

Comments

-1

We can use the DataFrame.iterrows() function to iterate over each of the rows of the given Dataframe and construct a list out of the data of each row:

# Empty list row_list =[] # Iterate over each row for index, rows in df.iterrows(): # Create list for the current row my_list =[rows.Date, rows.Event, rows.Cost] # append the list to the final list row_list.append(my_list) # Print print(row_list) 

We can successfully extract each row of the given data frame into a list

1 Comment

This is not a good idea, try to avoid using df.iterrows because it's anti-pattern and slow once the df gets large: stackoverflow.com/questions/16476924/…
-1

Note: I have seen many cases on Stack Overflow where converting a Pandas Series or DataFrame to a NumPy array or plain Python lists is entirely unecessary. If you're new to the library, consider double-checking whether the functionality you need is already offered by those Pandas objects.

To quote a comment by @jpp:

In practice, there's often no need to convert the NumPy array into a list of lists.


If a Pandas DataFrame/Series won't work, you can use the built-in DataFrame.to_numpy and Series.to_numpy methods.

3 Comments

This answer represents little more than your own beliefs. And quite frankly, it's a little embarrassing. There are perfectly valid reasons to convert a dataframe to a list/array, an advanced user would certainly know.
@NicolasGervais It might be a bit too much, yes, I'll edit it to generalize less. There are perfectly valid reasons to convert a dataframe to a list/array Of course, my answer doesn't really say anything to the contrary. an advanced user would certainly know. I don't see the point of that jab. I wrote this answer after noticing that many people were converting series to ndarrays or lists, and ndarrays to lists, simply because they were unaware of what operations those objects support.
I'm referring to very blatant cases, like doing for elem in some_series.values.tolist(): because they don't know that you can iterate over the elements of a series. I'm not sure what's so awful about this answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.