Skip to content

Conversation

@mroeschke
Copy link
Member

Additionally

  • Show the dtypes in the whatsnew for clarity
  • Note in the docs that read_csv also supports the global nullable_backend option
@mroeschke mroeschke added Enhancement IO Data IO issues that don't fit into a more specific label Arrow pyarrow functionality labels Nov 22, 2022
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``)
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic, but it seems read_excel supports use_nullable_dtypes but not io.nullable_backend. We should fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add this in a follow up PR.

.. note
Currently only ``io.nullable_backend`` set to ``"pyarrow"`` is supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to implement the flag for pandas as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to do this in a follow up PR (unless you're interested :) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No is fine, just wanted to understand if this is intended at all.

i want to tackle json and sql next

"float": np.arange(4.0, 7.0, dtype="float64"),
"float_with_nan": [2.0, np.nan, 3.0],
"bool": [True, False, None],
"datetime": pd.date_range("20130101", periods=3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add bool without na?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added.

],
}
)
bytes_data = df.to_orc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid something subtle: can you do df.copy().to… since you are using df below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added the copy

@mroeschke mroeschke added this to the 2.0 milestone Nov 23, 2022
@mroeschke mroeschke merged commit d8cfbd2 into pandas-dev:main Nov 23, 2022
@mroeschke mroeschke deleted the enh/pyarrow_types/orc branch November 23, 2022 00:00
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
…ad_orc (pandas-dev#49827) * ENH: Add use_nullable_dtypes and nullable_backend to read_orc * Skip if not required pa version * Address review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Enhancement IO Data IO issues that don't fit into a more specific label

3 participants