Skip to content

Conversation

@matttan90
Copy link
Contributor

@matttan90 matttan90 commented Nov 2, 2019


What

Whilst the original issue is from the factory method DataFrame.from_dict(d, orient='index'), the main issue is the order of elements within the list in the main constructor:

For example:

pd.DataFrame([[1, 2, 3], 4]) # doesn't work # TypeError: object of type 'int' has no len() In [2]: pd.DataFrame([4, [1, 2, 3]]) # works, creates a 1D DataFrame Out[2]: 0 0 4 1 [1, 2, 3] 

Current Constructor Logic on List Argument

  • Current logic looks at the first element, and infers that all other elements are iterables as well.
    • if all elements in the list are iterables, it generates a 2D DataFrame.
    • if any element from index-1 onwards is a non-iterable, it doesn't have a len() method and fails.

Proposed Solution

  • Based on the first element, try to infer that all elements are iterables as well.
    • If not all subsequent elements are iterables, then return 1D DataFrame. This would be the same behaviour as it would have been if the first element in the list is non-iterable.
    • This should not have performance degradation as there is noneed to check if all elements in the list are iterables.

Note

  • This does not solve the issue whereby we have iterables of different types (such as lists and strings...)
In [2]: pd.DataFrame([[1, 2, 3], 'foobar']) Out[2]: 0 1 2 3 4 5 0 1 2 3 None None None 1 f o o b a r 
@matttan90 matttan90 changed the title 29213: Dataframe Constructor from list of list and non-iterables 29213: Dataframe Constructor from List of List and non-iterables Nov 2, 2019
@simonjayhawkins simonjayhawkins added Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure labels Nov 2, 2019
@alimcmaster1 alimcmaster1 added DataFrame DataFrame data structure and removed DataFrame DataFrame data structure labels Nov 2, 2019
arrays, columns, index, columns, dtype=dtype
)

except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the TypeError actualy being raised? we dont' want to have very wide try/excepts they tend to hide errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its actually raised here. At that point, it is already inferred that it is a list of lists (which is not always true).

Debugging trace:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../core/frame.py:465: in __init__ arrays, columns = to_arrays(data, columns, dtype=dtype) ../../core/internals/construction.py:452: in to_arrays return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype) ../../core/internals/construction.py:484: in _list_to_arrays content = list(lib.to_object_array(data).T) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E TypeError: object of type 'int' has no len() 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also try to catch that specific exception with something like this?

except TypeError as e: if not str(e).endswith("has no len()"): raise else: .... 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want a try/except here at all, this just hides errors. you can do something more narrow in _list_to_arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks for the feedback! I'll have a look

arrays, columns, index, columns, dtype=dtype
)

except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want a try/except here at all, this just hides errors. you can do something more narrow in _list_to_arrays

@jreback
Copy link
Contributor

jreback commented Dec 1, 2019

can you merge master and update

@WillAyd
Copy link
Member

WillAyd commented Dec 17, 2019

@matttan90 can you address comments?

@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and will look again

@matttan90
Copy link
Contributor Author

Sorry. My bad for disappearing. Will think again on how I can implement a fix on this..

@WillAyd
Copy link
Member

WillAyd commented Feb 2, 2020

Closing as I think stale, but @matttan90 ping if you want to pick back up and can merge master / address comments

@WillAyd WillAyd closed this Feb 2, 2020
@matttan90 matttan90 deleted the dataframe_from_dict_mixed_iterables branch February 3, 2020 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure

5 participants