1

I'm trying to import data to a pandas DataFrame with columns being date string, label, value. My data looks like the following (just with 4 dates and 5 labels)

from numpy import random import numpy as np import pandas as pd # Creating the data dates = ("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04") values = [random.rand(5) for _ in range(4)] data = dict(zip(dates,values)) 

So, the data is a dictionary where the keys are dates, the keys a list of values where the index is the label.

Loading this data structure into a DataFrame

df1 = pd.DataFrame(data) 

gives me the dates as columns, the label as index, and the value as the value.

An alternative loading would be

df2 = pd.DataFrame() df2.from_dict(data, orient='index') 

where the dates are index, and columns are labels.

In either of both cases do I manage to do pivoting or stacking to my preferred view.

How should I approach the pivoting/stacking to get the view I want? Or should I change my data structure before loading it into a DataFrame? In particular I'd like to avoid of having to create all the rows of the table beforehand by using a bunch of calls to zip.

2
  • Does pd.DataFrame(data).transpose() meet your needs? Commented Aug 1, 2017 at 14:08
  • That gives me the same DataFrame as if I would have used from_dict with orient='index' option. Commented Aug 1, 2017 at 14:11

1 Answer 1

2

IIUC:

Option 1
pd.DataFrame.stack

pd.DataFrame(data).stack() \ .rename('value').rename_axis(['label', 'date']).reset_index() label date value 0 0 2015-01-01 0.345109 1 0 2015-01-02 0.815948 2 0 2015-01-03 0.758709 3 0 2015-01-04 0.461838 4 1 2015-01-01 0.584527 5 1 2015-01-02 0.823529 6 1 2015-01-03 0.714700 7 1 2015-01-04 0.160735 8 2 2015-01-01 0.779006 9 2 2015-01-02 0.721576 10 2 2015-01-03 0.246975 11 2 2015-01-04 0.270491 12 3 2015-01-01 0.465495 13 3 2015-01-02 0.622024 14 3 2015-01-03 0.227865 15 3 2015-01-04 0.638772 16 4 2015-01-01 0.266322 17 4 2015-01-02 0.575298 18 4 2015-01-03 0.335095 19 4 2015-01-04 0.761181 

Option 2
comprehension

pd.DataFrame( [[i, d, v] for d, l in data.items() for i, v in enumerate(l)], columns=['label', 'date', 'value'] ) label date value 0 0 2015-01-01 0.345109 1 1 2015-01-01 0.584527 2 2 2015-01-01 0.779006 3 3 2015-01-01 0.465495 4 4 2015-01-01 0.266322 5 0 2015-01-02 0.815948 6 1 2015-01-02 0.823529 7 2 2015-01-02 0.721576 8 3 2015-01-02 0.622024 9 4 2015-01-02 0.575298 10 0 2015-01-03 0.758709 11 1 2015-01-03 0.714700 12 2 2015-01-03 0.246975 13 3 2015-01-03 0.227865 14 4 2015-01-03 0.335095 15 0 2015-01-04 0.461838 16 1 2015-01-04 0.160735 17 2 2015-01-04 0.270491 18 3 2015-01-04 0.638772 19 4 2015-01-04 0.761181 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.