Skip to content

Commit 20b1c9c

Browse files
committed
Merge branch 'main' of https://github.com/pandas-dev/pandas into object_reduction_axis_1_attempt_2
2 parents 993c4bb + 474b3db commit 20b1c9c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+748
-321
lines changed

.github/workflows/package-checks.yml

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ on:
99
branches:
1010
- main
1111
- 1.5.x
12+
types: [ labeled, opened, synchronize, reopened ]
1213

1314
permissions:
1415
contents: read
1516

1617
jobs:
1718
pip:
19+
if: ${{ github.event.label.name == 'Build' || contains(github.event.pull_request.labels.*.name, 'Build') || github.event_name == 'push'}}
1820
runs-on: ubuntu-latest
1921
strategy:
2022
matrix:
@@ -38,13 +40,6 @@ jobs:
3840
with:
3941
python-version: '3.8'
4042

41-
# Hacky patch to disable building cython extensions.
42-
# This job should only check that the extras successfully install.
43-
- name: Disable building ext_modules
44-
run: |
45-
sed -i '/ext_modules=/d' setup.py
46-
shell: bash -el {0}
47-
4843
- name: Install required dependencies
4944
run: |
5045
python -m pip install --upgrade pip setuptools wheel python-dateutil pytz numpy cython

asv_bench/benchmarks/groupby.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -671,12 +671,8 @@ class String:
671671
["str", "string[python]"],
672672
[
673673
"sum",
674-
"prod",
675674
"min",
676675
"max",
677-
"mean",
678-
"median",
679-
"var",
680676
"first",
681677
"last",
682678
"any",

asv_bench/benchmarks/series_methods.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,48 @@ def time_dropna(self, dtype):
7979
self.s.dropna()
8080

8181

82+
class Fillna:
83+
84+
params = [
85+
[
86+
"datetime64[ns]",
87+
"float64",
88+
"Int64",
89+
"int64[pyarrow]",
90+
"string",
91+
"string[pyarrow]",
92+
],
93+
[None, "pad", "backfill"],
94+
]
95+
param_names = ["dtype", "method"]
96+
97+
def setup(self, dtype, method):
98+
N = 10**6
99+
if dtype == "datetime64[ns]":
100+
data = date_range("2000-01-01", freq="S", periods=N)
101+
na_value = NaT
102+
elif dtype == "float64":
103+
data = np.random.randn(N)
104+
na_value = np.nan
105+
elif dtype in ("Int64", "int64[pyarrow]"):
106+
data = np.arange(N)
107+
na_value = NA
108+
elif dtype in ("string", "string[pyarrow]"):
109+
data = tm.rands_array(5, N)
110+
na_value = NA
111+
else:
112+
raise NotImplementedError
113+
fill_value = data[0]
114+
ser = Series(data, dtype=dtype)
115+
ser[::2] = na_value
116+
self.ser = ser
117+
self.fill_value = fill_value
118+
119+
def time_fillna(self, dtype, method):
120+
value = self.fill_value if method is None else None
121+
self.ser.fillna(value=value, method=method)
122+
123+
82124
class SearchSorted:
83125

84126
goal_time = 0.2

doc/source/_static/css/pandas.css

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@
55
--pst-color-info: 23, 162, 184;
66
}
77

8+
table {
9+
width: auto; /* Override fit-content which breaks Styler user guide ipynb */
10+
}
11+
812
/* Main index page overview cards */
913

1014
.intro-card {

doc/source/development/contributing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ version control to allow many people to work together on the project.
114114

115115
Some great resources for learning Git:
116116

117+
* the `Git documentation <https://git-scm.com/doc>`_.
117118
* the `GitHub help pages <https://help.github.com/>`_.
118119
* the `NumPy documentation <https://numpy.org/doc/stable/dev/index.html>`_.
119120
* Matthew Brett's `Pydagogue <https://matthew-brett.github.io/pydagogue/>`_.

doc/source/user_guide/io.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3827,22 +3827,28 @@ format of an Excel worksheet created with the ``to_excel`` method. Excellent ex
38273827
OpenDocument Spreadsheets
38283828
-------------------------
38293829

3830-
.. versionadded:: 0.25
3831-
3832-
The :func:`~pandas.read_excel` method can also read OpenDocument spreadsheets
3833-
using the ``odfpy`` module. The semantics and features for reading
3830+
The io methods for `Excel files`_ also support reading and writing OpenDocument spreadsheets
3831+
using the `odfpy <https://pypi.org/project/odfpy/>`__ module. The semantics and features for reading and writing
38343832
OpenDocument spreadsheets match what can be done for `Excel files`_ using
38353833
``engine='odf'``.
38363834

3835+
.. versionadded:: 0.25
3836+
3837+
The :func:`~pandas.read_excel` method can read OpenDocument spreadsheets
3838+
38373839
.. code-block:: python
38383840
38393841
# Returns a DataFrame
38403842
pd.read_excel("path_to_file.ods", engine="odf")
38413843
3842-
.. note::
3844+
.. versionadded:: 1.1.0
38433845

3844-
Currently pandas only supports *reading* OpenDocument spreadsheets. Writing
3845-
is not implemented.
3846+
Similarly, the :func:`~pandas.to_excel` method can write OpenDocument spreadsheets
3847+
3848+
.. code-block:: python
3849+
3850+
# Writes DataFrame to a .ods file
3851+
df.to_excel("path_to_file.ods", engine="odf")
38463852
38473853
.. _io.xlsb:
38483854

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Version 1.5
2424
.. toctree::
2525
:maxdepth: 2
2626

27+
v1.5.3
2728
v1.5.2
2829
v1.5.1
2930
v1.5.0

doc/source/whatsnew/v1.5.2.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_152:
22

3-
What's new in 1.5.2 (November ??, 2022)
3+
What's new in 1.5.2 (November 21, 2022)
44
---------------------------------------
55

66
These are the changes in pandas 1.5.2. See :ref:`release` for a full changelog
@@ -36,7 +36,6 @@ Bug fixes
3636
Other
3737
~~~~~
3838
- Reverted ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
39-
-
4039

4140
.. ---------------------------------------------------------------------------
4241
.. _whatsnew_152.contributors:

doc/source/whatsnew/v1.5.3.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
.. _whatsnew_153:
2+
3+
What's new in 1.5.3 (December ??, 2022)
4+
---------------------------------------
5+
6+
These are the changes in pandas 1.5.3. See :ref:`release` for a full changelog
7+
including other versions of pandas.
8+
9+
{{ header }}
10+
11+
.. ---------------------------------------------------------------------------
12+
.. _whatsnew_153.regressions:
13+
14+
Fixed regressions
15+
~~~~~~~~~~~~~~~~~
16+
- Fixed performance regression in :meth:`Series.isin` when ``values`` is empty (:issue:`49839`)
17+
-
18+
19+
.. ---------------------------------------------------------------------------
20+
.. _whatsnew_153.bug_fixes:
21+
22+
Bug fixes
23+
~~~~~~~~~
24+
-
25+
-
26+
27+
.. ---------------------------------------------------------------------------
28+
.. _whatsnew_153.other:
29+
30+
Other
31+
~~~~~
32+
-
33+
-
34+
35+
.. ---------------------------------------------------------------------------
36+
.. _whatsnew_153.contributors:
37+
38+
Contributors
39+
~~~~~~~~~~~~
40+
41+
.. contributors:: v1.5.2..v1.5.3|HEAD

doc/source/whatsnew/v2.0.0.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (
3333
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
3434
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3535

36-
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``)
36+
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``)
3737
to return pyarrow-backed dtypes when set to ``"pyarrow"`` (:issue:`48957`).
3838

3939
.. ipython:: python
@@ -45,7 +45,7 @@ to return pyarrow-backed dtypes when set to ``"pyarrow"`` (:issue:`48957`).
4545
""")
4646
with pd.option_context("io.nullable_backend", "pyarrow"):
4747
df = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
48-
df
48+
df.dtypes
4949
5050
.. _whatsnew_200.enhancements.other:
5151

@@ -62,6 +62,7 @@ Other enhancements
6262
- Fix ``test`` optional_extra by adding missing test package ``pytest-asyncio`` (:issue:`48361`)
6363
- :func:`DataFrame.astype` exception message thrown improved to include column name when type conversion is not possible. (:issue:`47571`)
6464
- :func:`date_range` now supports a ``unit`` keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index (:issue:`49106`)
65+
- :func:`timedelta_range` now supports a ``unit`` keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index (:issue:`49824`)
6566
- :meth:`DataFrame.to_json` now supports a ``mode`` keyword with supported inputs 'w' and 'a'. Defaulting to 'w', 'a' can be used when lines=True and orient='records' to append record oriented json lines to an existing json file. (:issue:`35849`)
6667
- Added ``name`` parameter to :meth:`IntervalIndex.from_breaks`, :meth:`IntervalIndex.from_arrays` and :meth:`IntervalIndex.from_tuples` (:issue:`48911`)
6768
-
@@ -336,6 +337,7 @@ Other API changes
336337
- Passing ``dtype`` of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
337338
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
338339
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
340+
- Changed behavior of :meth:`Series.quantile` and :meth:`DataFrame.quantile` with :class:`SparseDtype` to retain sparse dtype (:issue:`49583`)
339341
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
340342
- :meth:`Series.unique` with dtype "timedelta64[ns]" or "datetime64[ns]" now returns :class:`TimedeltaArray` or :class:`DatetimeArray` instead of ``numpy.ndarray`` (:issue:`49176`)
341343
- :func:`to_datetime` and :class:`DatetimeIndex` now allow sequences containing both ``datetime`` objects and numeric entries, matching :class:`Series` behavior (:issue:`49037`)
@@ -346,6 +348,8 @@ Other API changes
346348
- Changed behavior of :class:`Index` constructor with sequence containing at least one ``NaT`` and everything else either ``None`` or ``NaN`` to infer ``datetime64[ns]`` dtype instead of ``object``, matching :class:`Series` behavior (:issue:`49340`)
347349
- :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default) will now set the index on the returned :class:`DataFrame` to a :class:`RangeIndex` instead of a :class:`Int64Index` (:issue:`49745`)
348350
- Changed behavior of :class:`Index` constructor with an object-dtype ``numpy.ndarray`` containing all-``bool`` values or all-complex values, this will now retain object dtype, consistent with the :class:`Series` behavior (:issue:`49594`)
351+
- Changed behavior of :meth:`DataFrame.shift` with ``axis=1``, an integer ``fill_value``, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (:issue:`49842`)
352+
- :meth:`DataFrame.values`, :meth:`DataFrame.to_numpy`, :meth:`DataFrame.xs`, :meth:`DataFrame.reindex`, :meth:`DataFrame.fillna`, and :meth:`DataFrame.replace` no longer silently consolidate the underlying arrays; do ``df = df.copy()`` to ensure consolidation (:issue:`49356`)
349353
-
350354

351355
.. ---------------------------------------------------------------------------
@@ -584,6 +588,7 @@ Performance improvements
584588
- Performance improvement in :meth:`.DataFrameGroupBy.mean`, :meth:`.SeriesGroupBy.mean`, :meth:`.DataFrameGroupBy.var`, and :meth:`.SeriesGroupBy.var` for extension array dtypes (:issue:`37493`)
585589
- Performance improvement in :meth:`MultiIndex.isin` when ``level=None`` (:issue:`48622`, :issue:`49577`)
586590
- Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
591+
- Performance improvement in :meth:`Series.fillna` for pyarrow-backed dtypes (:issue:`49722`)
587592
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
588593
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
589594
- Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
@@ -597,12 +602,14 @@ Performance improvements
597602
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
598603
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
599604
- Performance improvement in ``var`` for nullable dtypes (:issue:`48379`).
605+
- Performance improvement when iterating over a :class:`~arrays.ArrowExtensionArray` (:issue:`49825`).
600606
- Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
601607
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
602608
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``sort=False`` (:issue:`48976`)
603609
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when ``by`` is a categorical type and ``observed=False`` (:issue:`49596`)
604610
- Performance improvement in :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default). Now the index will be a :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49745`)
605611
- Performance improvement in :func:`merge` when not merging on the index - the new index will now be :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49478`)
612+
- Performance improvement in :meth:`DataFrame.to_dict` and :meth:`Series.to_dict` when using any non-object dtypes (:issue:`46470`)
606613

607614
.. ---------------------------------------------------------------------------
608615
.. _whatsnew_200.bug_fixes:
@@ -652,6 +659,8 @@ Conversion
652659
- Bug where any :class:`ExtensionDtype` subclass with ``kind="M"`` would be interpreted as a timezone type (:issue:`34986`)
653660
- Bug in :class:`.arrays.ArrowExtensionArray` that would raise ``NotImplementedError`` when passed a sequence of strings or binary (:issue:`49172`)
654661
- Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`)
662+
- Bug in :meth:`TimedeltaArray.astype` raising ``TypeError`` when converting to a pyarrow duration type (:issue:`49795`)
663+
-
655664

656665
Strings
657666
^^^^^^^

0 commit comments

Comments
 (0)