Skip to content

Conversation

@derDieDasJojo
Copy link

this is usefull because you cant reach the same effect by just using
dropna().to_dict() also see
http://stackoverflow.com/questions/26033301/make-pandas-dataframe-to-a-dict-and-dropna/26033302#26033302

unknown added 2 commits September 25, 2014 10:40
@derDieDasJojo derDieDasJojo changed the title add the parameter dropna to the function to_dict. ENH:add the parameter dropna to the function to_dict. Sep 25, 2014
@derDieDasJojo derDieDasJojo changed the title ENH:add the parameter dropna to the function to_dict. ENH: add the parameter dropna to the function to_dict. Sep 25, 2014
@cpcloud
Copy link
Member

cpcloud commented Sep 25, 2014

this needs a test

@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

actually I don't think this should be included - u can put the example in the cookbook of you would like

@derDieDasJojo
Copy link
Author

@cpcloud: all right, i can write some tests for it.
@jreback: do you have any arguments for your position ? why do you think, it should not be included ?
I don't know any other elegant way to do this. so i consider it to be a as usefull potential future part of pandas.

@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

  • this is easily done as the example from SO
  • not sure why this would actually be useful
  • in no other places is this done so makes the API inconsistent
@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

maybe if u explain what u r doing with it
why are you using a dict at all

@derDieDasJojo
Copy link
Author

why this could be usefull:
I have a dataframe:

>>> data A B 1 2 NaN 2 3 44 3 4 NaN 

and need a dict of that data. because in a dict there is an association by id:value there is no need to list all empty values with id:NaN. Instead typically you just every entry with an value is listed.
So if you want a dict out of a dataframe it is very well possible, that you want something like this:

{'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}} 

But there is no way to get something like this using pandas methods. So in this case you are back on your own and have to write a dict-comprehension that somewhat like this:

dict((k, v.dropna().to_dict()) for k, v in compat.iteritems(data)) 

Of course it is possible to write your own dict-comprehension every time you need it. But it is easier and more robust if there is already a library doing this for you.
With this enhancement you could take advantage of the already existing to_dict() function for DataFrames and simply write

data.to_dict(dropna=True) 

API consistancy

if you look up DataFrame.to_excel or DataFrame.to_html every one of it has some parameters that are special for this output type and make sence for this one but not for other ones.
And on the same way this could be an optional parameter that makes sence for Data transformation from a DataFrame to a dict.

why using a dict
actually this is no good argument agains a DataFrame.to_dict() function enhancement.
If pandas offers a functionality to reshape data from a DataFrame to a dict, why not using a dict ?
I am using a dict, because another python class has an interface expecting a dict.

Tests
I can write tests for it - thats no problem.
I will do this if this enhancement is considered to be usefull and has a change to be accepted - otherwise they would anyway be useless

@jorisvandenbossche
Copy link
Member

Another possibility would be to use the na_rep keyword for this (a keyword used by much of the other to_... functions). Where the default value for this is just the python nan, but you could also provide a string 'NaN' or provide False to not include missing values at all (the dropna from this PR).

I think, if we want to include the functionality proposed in this PR, using dropna is more clear for this specifically, but something like na_rep could have more general use. Of course, having here a possible False for na_rep is something that is not available in the other writers which makes it also a bit inconsistent.

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

ok will mark as an enhancement

needs tests

@jreback jreback added Enhancement Output-Formatting __repr__ of pandas objects, to_string labels Sep 30, 2014
@jreback jreback modified the milestones: 0.16, 0.15.1 Sep 30, 2014
@jreback jreback modified the milestones: Next Major Release, 0.16.0 Mar 5, 2015
@jreback
Copy link
Contributor

jreback commented May 9, 2015

closing pls reopen if/when updated

@jreback jreback closed this May 9, 2015
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release May 14, 2015
@jorisvandenbossche
Copy link
Member

@derDieDasJojo if you would update, I think using a na_rep makes it a bit more generally useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Output-Formatting __repr__ of pandas objects, to_string

4 participants