Skip to content

Conversation

@jreback
Copy link
Contributor

@jreback jreback commented Nov 13, 2015

closes #11595

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design labels Nov 13, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015
@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

@mrocklin
Copy link
Contributor

Does this descend into categories and the index?

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

wondering if you were going to ask that....it DOES do the index. not the categories, but I can fix this ......

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

Now includes embedded usage for Index & Categorical

In [5]: df = DataFrame({'A' : ['foo']*1000}) In [6]: df['B'] = df['A'].astype('category') In [8]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Data columns (total 2 columns): A 1000 non-null object B 1000 non-null category dtypes: category(1), object(1) memory usage: 16.6+ KB In [9]: df.info(deep=True) <class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Data columns (total 2 columns): A 1000 non-null object B 1000 non-null category dtypes: category(1), object(1) memory usage: 55.7 KB In [11]: df.memory_usage() Out[11]: A 8000 B 1008 dtype: int64 In [12]: df.memory_usage(deep=True) Out[12]: A 48000 B 1048 dtype: int64 
@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

And providing on Series as well

In [6]: df['A'].memory_usage() Out[6]: 8000 In [7]: df['A'].memory_usage(index=True) Out[7]: 16000 In [8]: df['A'].memory_usage(index=True,deep=True) Out[8]: 56000 
@mrocklin
Copy link
Contributor

BTW, I'm glad that memory_usage_of_objects is usable on numpy arrays as well. I may end up using that outside of pandas.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

right dask.array could certainly introspect here as well

@max-sixty
Copy link
Contributor

I'll wait until this is merged before adding __getsize__.
Is there a reason index is False by default? I'd have thought that would be 'part of the package'.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

@MaximilianR I don't recall the discussion, but I think we should change the default. Note that this is just for a direct call to memory_usage and not for .info where it is included.

why don't you post an issue and we'll change in 0.18 (as its a small API change).

jreback added a commit that referenced this pull request Nov 13, 2015
PERF/DOC: Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595
@jreback jreback merged commit ddd0372 into pandas-dev:master Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Output-Formatting __repr__ of pandas objects, to_string

3 participants