1

I have a list of lists, a sample of which is pasted below. I would like to convert this to a pandas data frame but the list contains many duplicates. How would I remove duplicates from a list of lists like this and convert to a data frame with two columns: timestamp and price?

[[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] 

4 Answers 4

4

You can flatten the list and drop duplicates from your dataframe.

# import toolboxes import pandas as pd from itertools import chain # get data data = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] # flatten, create df and drop duplicates a = list(chain.from_iterable(data)) df = pd.DataFrame(a) df = df.drop_duplicates() 

Output:

print(df) timestamp price 0 1648558320942 47876.0 2 1648558321945 47881.0 5 1648558326768 47876.0 
Sign up to request clarification or add additional context in comments.

2 Comments

thank you @Pantelis that was spot on. If you dont mind could you explain how chain.from_iterable is working? I am new to python and would like to understand how this solution works
No problem, happy to help. As the name suggests, the chain.from_iterable function chains multiple iterables together. In this case the multiple iterables are the nested lists in data. Using the chain.from_iterable function we essentially unpack or flatten the nested lists into one list a. This guide might also be useful.
1

Just need to flatten out that list of lists:

import pandas as pd data = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] newData = [] for each in data: newData += each # or list comprehension # newData = [each for v in data for each in v] df = pd.DataFrame(newData).drop_duplicates() 

And as a one-liner:

df = pd.DataFrame([each for v in data for each in v]).drop_duplicates() 

Output:

print(df) timestamp price 0 1648558320942 47876.0 2 1648558321945 47881.0 5 1648558326768 47876.0 

Comments

0

Quick answer:

pd.DataFrame([item for sublist in my_list for item in sublist]).drop_duplicates() 

Explanation:

  1. Flatten list of lists
  2. Create pandas DataFrame
  3. Remove duplicates

Comments

0
import pandas as pd list_of_dicts = [[{'timestamp': 1648558320942, 'price': 47876.0}, {'timestamp': 1648558320942, 'price': 47876.0}], [{'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}, {'timestamp': 1648558321945, 'price': 47881.0}], [{'timestamp': 1648558326768, 'price': 47876.0}]] df = pd.DataFrame([i[0] for i in list_of_dicts]) print(df) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.