Skip to content

BUG: One column 2d arrays not coerced to 1d with ArrayManager #44788

@ivirshup

Description

@ivirshup

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import os os.environ["PANDAS_DATA_MANAGER"] = "array" import pandas as pd, numpy as np df = pd.DataFrame(index=np.arange(10)) df["foo"] = np.ones((10, 1)) # ValueError: Expected a 1D array, got an array with shape (10, 1) <details> <summary> Full traceback </summary> ```pytb --------------------------------------------------------------------------- KeyError Traceback (most recent call last) /usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: /usr/local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() /usr/local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'foo' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) /usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item_mgr(self, key, value) 3750 try: -> 3751 loc = self._info_axis.get_loc(key) 3752 except KeyError: /usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 KeyError: 'foo' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) /var/folders/bd/43q20k0n6z15tdfzxvd22r7c0000gn/T/ipykernel_14506/151211532.py in <module> 5 6 df = pd.DataFrame(index=np.arange(10)) ----> 7 df["foo"] = np.ones((10, 1)) /usr/local/lib/python3.9/site-packages/pandas/core/frame.py in __setitem__(self, key, value) 3610 else: 3611 # set column -> 3612 self._set_item(key, value) 3613 3614 def _setitem_slice(self, key: slice, value): /usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item(self, key, value) 3795 value = np.tile(value, (len(existing_piece.columns), 1)).T 3796 -> 3797 self._set_item_mgr(key, value) 3798 3799 def _set_value( /usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _set_item_mgr(self, key, value) 3752 except KeyError: 3753 # This item wasn't present, just insert at end -> 3754 self._mgr.insert(len(self._info_axis), key, value) 3755 else: 3756 self._iset_item_mgr(loc, value) /usr/local/lib/python3.9/site-packages/pandas/core/internals/array_manager.py in insert(self, loc, item, value) 872 value = value[0, :] # type: ignore[index] 873 else: --> 874 raise ValueError( 875 f"Expected a 1D array, got an array with shape {value.shape}" 876 ) ValueError: Expected a 1D array, got an array with shape (10, 1)
```

Issue Description

Using the array manager, a 2d array with only one non-singleton dimension causes errors when assigned to a dataframe.

This does not happen with the BlockManager. I would assume the same thing would work, since these should be pretty equivalent.

Expected Behavior

This works fine:

The behavior with the block manager is expected.

import os os.environ["PANDAS_DATA_MANAGER"] = "block" import pandas as pd, numpy as np df = pd.DataFrame(index=np.arange(10)) df["foo"] = np.ones((10, 1))

Installed Versions

INSTALLED VERSIONS ------------------ commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.9.9.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Tue Oct 12 18:33:42 PDT 2021; root:xnu-7195.141.8~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 1.3.4 numpy : 1.20.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.3.1 setuptools : 59.0.1 Cython : None pytest : 6.2.5 hypothesis : None sphinx : 4.1.2 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 7.29.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.11.1 fastparquet : None gcsfs : None matplotlib : 3.5.0 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : 0.20.1 xlrd : 1.2.0 xlwt : None numba : 0.54.1 

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions