1

I am struggling with the following problem. I have multiple individual dataframes (50) which each contain one specific characteristic for a number of stocks (say price, standard deviation etc), so something like this:

import pandas as pd import numpy as np dates = pd.date_range('20130101',periods=6) df1 = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns('AAPL','MSFT','TSLA','GE')) df2 = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('AAPL','MSFT','TSLA','GE')) df3 = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('AAPL','MSFT','TSLA','GE')) df4 = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('AAPL','MSFT','TSLA','GE')) 

Now I would like to merge those in such a way, that I obtain one dataframe for each stock that contains all of the characteristics for that particular stock, so something like this:

aapl = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('AAPL1','AAPL2','AAPL3','AAPL4')) msft = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('MSFT1','MSFT2','MSFT3','MSFT4')) tsla = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('TSLA1','TSLA2','TSLA3','TSLA4')) ge = pd.DataFrame(np.random.randn(6,4),index=dates,\ columns=('GE1','GE2','GE3','GE4')) 

1 Answer 1

4

I would use concat:

In [11]: res = pd.concat([df1, df2, df3, df4], keys=[1, 2, 3, 4], axis=1) In [12]: res Out[12]: 1 2 3 4 AAPL MSFT TSLA GE AAPL MSFT TSLA GE AAPL MSFT TSLA GE AAPL MSFT TSLA GE 2013-01-01 0.144764 1.292692 -1.303908 -0.843892 -1.104683 -1.178507 0.898648 -0.626209 0.492292 0.147169 1.814729 0.562406 -0.121656 0.865116 0.430813 -0.326225 2013-01-02 -0.163063 0.019601 -2.565271 0.708233 0.317464 -2.574969 -0.080129 -1.176806 0.045253 0.684745 -1.062797 -0.483389 -0.579194 0.401920 -0.393240 0.113734 2013-01-03 0.213592 -0.732072 -0.942323 0.191418 -0.962551 -0.027296 0.665155 2.775983 -0.627107 -0.015927 0.939107 0.239057 0.548166 -1.753082 -0.007525 1.771812 2013-01-04 1.067464 -0.331888 0.638843 -1.197937 0.925848 2.273798 0.646925 -2.910974 0.531653 -0.748255 0.262995 0.077923 -0.867982 1.174089 0.183573 0.263749 2013-01-05 0.873720 -0.816305 0.270330 -1.543169 0.116701 -1.392711 1.519368 -0.601046 -0.154348 -0.345653 -0.785385 -0.095604 1.351421 0.192520 0.802445 2.107376 2013-01-06 -0.781975 1.007111 -2.555165 -1.866207 1.480997 0.212057 1.053570 -0.798790 -0.785660 -0.853178 -2.274432 0.481971 -1.555876 -0.928069 -0.408319 0.270534 

then you can pull out APPL using xs:

In [13]: res.xs("AAPL", level=1, axis=1) Out[13]: 1 2 3 4 2013-01-01 0.144764 -1.104683 0.492292 -0.121656 2013-01-02 -0.163063 0.317464 0.045253 -0.579194 2013-01-03 0.213592 -0.962551 -0.627107 0.548166 2013-01-04 1.067464 0.925848 0.531653 -0.867982 2013-01-05 0.873720 0.116701 -0.154348 1.351421 2013-01-06 -0.781975 1.480997 -0.785660 -1.555876 

Perhaps a nicer thing is to get a dict of the groups:

In [21]: d = dict(iter(res.groupby(level=1, axis=1))) In [22]: d["AAPL"] Out[22]: 1 2 3 4 AAPL AAPL AAPL AAPL 2013-01-01 0.144764 -1.104683 0.492292 -0.121656 2013-01-02 -0.163063 0.317464 0.045253 -0.579194 2013-01-03 0.213592 -0.962551 -0.627107 0.548166 2013-01-04 1.067464 0.925848 0.531653 -0.867982 2013-01-05 0.873720 0.116701 -0.154348 1.351421 2013-01-06 -0.781975 1.480997 -0.785660 -1.555876 
Sign up to request clarification or add additional context in comments.

1 Comment

I didn't realize that when grouping by an index (or index level) it's not dropped in the groupby. Very interesting

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.