convert list of lists to pandas data frame

Question

I have a list of lists, a sample of which is pasted below. I would like to convert this to a pandas data frame but the list contains many duplicates. How would I remove duplicates from a list of lists like this and convert to a data frame with two columns: timestamp and price?

[[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]]

Pantelis · Accepted Answer · 2022-03-29 13:18:59Z

You can flatten the list and drop duplicates from your dataframe.

# import toolboxes import pandas as pd from itertools import chain # get data data = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] # flatten, create df and drop duplicates a = list(chain.from_iterable(data)) df = pd.DataFrame(a) df = df.drop_duplicates()

Output:

print(df) timestamp price 0 1648558320942 47876.0 2 1648558321945 47881.0 5 1648558326768 47876.0

thank you @Pantelis that was spot on. If you dont mind could you explain how chain.from_iterable is working? I am new to python and would like to understand how this solution works
No problem, happy to help. As the name suggests, the chain.from_iterable function chains multiple iterables together. In this case the multiple iterables are the nested lists in data. Using the chain.from_iterable function we essentially unpack or flatten the nested lists into one list a. This guide might also be useful.

chitown88 · Accepted Answer · 2022-03-29 13:21:07Z

Just need to flatten out that list of lists:

import pandas as pd data = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] newData = [] for each in data: newData += each # or list comprehension # newData = [each for v in data for each in v] df = pd.DataFrame(newData).drop_duplicates()

And as a one-liner:

df = pd.DataFrame([each for v in data for each in v]).drop_duplicates()

Output:

print(df) timestamp price 0 1648558320942 47876.0 2 1648558321945 47881.0 5 1648558326768 47876.0

user16785237 · Accepted Answer · 2022-03-29 13:24:09Z

Quick answer:

pd.DataFrame([item for sublist in my_list for item in sublist]).drop_duplicates()

Explanation:

Flatten list of lists
Create pandas DataFrame
Remove duplicates

Manikandan Raju · Accepted Answer · 2022-03-29 13:25:37Z

import pandas as pd list_of_dicts = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] df = pd.DataFrame([i[0] for i in list_of_dicts]) print(df)

Collectives™ on Stack Overflow

convert list of lists to pandas data frame

4 Answers 4

2 Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Related