Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed FutureWarning emitting logic, reverted openpyxl workaround
  • Loading branch information
rhshadrach committed Nov 25, 2020
commit 32333817a8c4ff0ac644bd6f65f3b6f69a46396d
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,7 @@ Deprecations
- Deprecated :meth:`Index.asi8` for :class:`Index` subclasses other than :class:`DatetimeIndex`, :class:`TimedeltaIndex`, and :class:`PeriodIndex` (:issue:`37877`)
- The ``inplace`` parameter of :meth:`Categorical.remove_unused_categories` is deprecated and will be removed in a future version (:issue:`37643`)
- The ``null_counts`` parameter of :meth:`DataFrame.info` is deprecated and replaced by ``show_counts``. It will be removed in a future version (:issue:`37999`)
- :func:`read_excel` "xlrd" engine is deprecated. The recommended engine is "openpyxl" for "xlsx" and "xlsm" files, because "xlrd" is no longer maintained (:issue:`28547`).
- Deprecated the default argument ``engine=None`` of the function :func:`read_excel`, which uses the no longer maintained xlrd engine. Not specifying the engine will raise a ``FutureWarning``. This argument will default to ``"openpyxl"`` in a future version, which is now the recommended engine for xlsx and xlsm files (:issue:`28547`)

.. ---------------------------------------------------------------------------

Expand Down
35 changes: 20 additions & 15 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,15 @@
If io is not a buffer or path, this must be set to identify io.
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb", default "xlrd".
Engine compatibility :
- "xlrd" supports most old/new Excel file formats.
- "xlrd" supports most old/new Excel file formats but is no longer maintained.
- "openpyxl" supports newer Excel file formats.
- "odf" supports OpenDocument file formats (.odf, .ods, .odt).
- "pyxlsb" supports Binary Excel files.

.. deprecated:: 1.2.0
The default value ``None`` is deprecated and will be changed to ``"openpyxl"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of None is only deprecated when you're reading an xls/xlsx file, not when reading any of the other formats like ods.
It might be worth explaining here (outside of the deprecated box) what the default engine is if not specified (eg also that openpyxl is automatically used if xlrd is not present for xlsx files)

in a future version. Not specifying an engine will raise a FutureWarning.

converters : dict, default None
Dict of functions for converting values in certain columns. Keys can
either be integers or column labels, values are functions that take one
Expand Down Expand Up @@ -881,10 +886,15 @@ class ExcelFile:
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``,
default ``xlrd`` for .xls* files, ``odf`` for .ods files.
Engine compatibility :
- ``xlrd`` supports most old/new Excel file formats.
- ``xlrd`` supports most old/new Excel file formats but is no longer maintained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think can you add a note that xlrd not support for python >= 3.9 (everywhere you mention)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I don't follow how this is different from python < 3.9, what do you mean by support? I am able to install xlrd on a python 3.9 environment:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the issues when you actually use xlrd and pandas on 3.9. this is the entire reason for the deprecation; it is completely broken on 3.9

Copy link
Member

@rhshadrach rhshadrach Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing that:

Python 3.9.0 (default, Nov 15 2020, 14:28:56) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> pd.__version__ '1.2.0.dev0+1364.g65319af6e' >>> pd.DataFrame({'a': [1, 2, 3]}).to_excel('test.xlsx') >>> pd.read_excel('test.xlsx', engine='xlrd') Unnamed: 0 a 0 0 1 1 1 2 2 2 3 

From the issue, I see this in the OP

xlrd is unmaintained and the previous maintainer has asked us to move towards openpyxl. xlrd works now, but might have some issues when Python 3.9 or later gets released and changes some elements of the XML parser, as default usage right now throws a PendingDeprecationWarning

but no mention of xlrd being broken on Python 3.9.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ``openpyxl`` supports newer Excel file formats.
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
- ``pyxlsb`` supports Binary Excel files.

.. deprecated:: 1.2.0
The default value ``None`` is deprecated and will be changed to
``"openpyxl"`` in a future version. Not specifying an engine will
raise a FutureWarning.
"""

from pandas.io.excel._odfreader import ODFReader
Expand All @@ -902,26 +912,21 @@ class ExcelFile:
def __init__(
self, path_or_buffer, engine=None, storage_options: StorageOptions = None
):
ext = None
if not isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
ext = os.path.splitext(str(path_or_buffer))[-1][1:]

if engine is None:
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be more specific about when to warn?

  • If people only have openpyxl installed, shouldn't we default to use that instead of having the warning?
  • From earlier discussion, I understood that we are (for now) fine to keep xlrd support for old excel files for which it is the only. So for those cases (eg .xls), we shouldn't raise a warning?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and python < 3.9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I've implemented the xls check, however I don't understand your first bullet. Isn't having the deprecation warning so that we don't change behavior without notice? Why is it okay to change behavior if they have openpyxl installed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay to change behavior if they have openpyxl installed?

My suggestions was for when they have only openpyxl installed. It will currently not be a very common situation, since we don't try to use openpyxl if xlrd is not present, unless the user actually specifies engine="openpyxl". But, given the deprecation warnings, this could be another solution for users: ensure you update your environment to only have openpyxl and not xlrd, and then you don't need to update all occurrences of read_excel in your code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks! I completely misread.

"The default argument engine=None is deprecated. "
"Specify the engine argument to suppress this warning.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify this message, to explain the reason why it is deprecated, and what people can do: either specify "xlrd", or either install openpyxl and specify that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to the following. Let me know of any thoughts.

DeprecationWarning: The default argument engine=None is deprecated. Using None defaults to the xlrd engine which is no longer maintained. The default value will be 'openpyxl' in a future version of pandas, although xlrd will continue to be allowed for the indefinite future. Either install openpyxl and specify it as the engine, or specify 'xlrd' to suppress this warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current behaviour, that sounds good!

FutureWarning,
stacklevel=4,
)
engine = "xlrd"
if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
if _is_ods_stream(path_or_buffer):
engine = "odf"
else:
if ext == "ods":
ext = os.path.splitext(str(path_or_buffer))[-1]
if ext == ".ods":
engine = "odf"

elif engine == "xlrd" and ext in ("xlsx", "xlsm"):
warnings.warn(
'The Excel reader engine "xlrd" is deprecated, use "openpyxl" instead. '
'Specify engine="openpyxl" to suppress this warning.',
FutureWarning,
stacklevel=2,
)
if engine not in self._engines:
raise ValueError(f"Unknown engine: {engine}")

Expand Down
7 changes: 1 addition & 6 deletions pandas/io/excel/_openpyxl.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from datetime import datetime
from typing import TYPE_CHECKING, Dict, List, Optional

import numpy as np
Expand Down Expand Up @@ -503,11 +502,7 @@ def _convert_cell(self, cell, convert_float: bool) -> Scalar:
from openpyxl.cell.cell import TYPE_BOOL, TYPE_ERROR, TYPE_NUMERIC

if cell.is_date:
try:
# workaround for inaccurate timestamp notation in excel
return datetime.fromtimestamp(round(cell.value.timestamp()))
except (AttributeError, OSError):
return cell.value
return cell.value
elif cell.data_type == TYPE_ERROR:
return np.nan
elif cell.data_type == TYPE_BOOL:
Expand Down
9 changes: 1 addition & 8 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@
"xlrd",
marks=[
td.skip_if_no("xlrd"),
pytest.mark.filterwarnings("ignore:.*(tree\\.iter|html argument)"),
pytest.mark.filterwarnings(
'ignore:The Excel reader engine "xlrd" is deprecated,'
),
],
),
pytest.param(
Expand Down Expand Up @@ -984,10 +980,7 @@ def test_read_excel_squeeze(self, read_ext):
expected = Series([1, 2, 3], name="a")
tm.assert_series_equal(actual, expected)

def test_deprecated_kwargs(self, engine, read_ext):
if engine == "xlrd":
pytest.skip("Use of xlrd engine produces a FutureWarning as well")

def test_deprecated_kwargs(self, read_ext):
with tm.assert_produces_warning(FutureWarning, raise_on_extra_warnings=False):
pd.read_excel("test1" + read_ext, "Sheet1", 0)

Expand Down
3 changes: 0 additions & 3 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1197,9 +1197,6 @@ def test_datetimes(self, path):

tm.assert_series_equal(write_frame["A"], read_frame["A"])

@pytest.mark.filterwarnings(
'ignore:The Excel reader engine "xlrd" is deprecated:FutureWarning'
)
def test_bytes_io(self, engine):
# see gh-7074
bio = BytesIO()
Expand Down
24 changes: 10 additions & 14 deletions pandas/tests/io/excel/test_xlrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@ def skip_ods_and_xlsb_files(read_ext):
pytest.skip("Not valid for xlrd")


@pytest.mark.filterwarnings(
'ignore:The Excel reader engine "xlrd" is deprecated:FutureWarning'
)
def test_read_xlrd_book(read_ext, frame):
df = frame

Expand All @@ -39,9 +36,6 @@ def test_read_xlrd_book(read_ext, frame):


# TODO: test for openpyxl as well
@pytest.mark.filterwarnings(
'ignore:The Excel reader engine "xlrd" is deprecated:FutureWarning'
)
def test_excel_table_sheet_by_index(datapath, read_ext):
path = datapath("io", "data", "excel", f"test1{read_ext}")
with ExcelFile(path, engine="xlrd") as excel:
Expand All @@ -52,18 +46,20 @@ def test_excel_table_sheet_by_index(datapath, read_ext):
def test_excel_file_warning_with_xlsx_file(datapath):
# GH 29375
path = datapath("io", "data", "excel", "test1.xlsx")
# DeprecationWarning: "This method will be removed in future versions.
# Use 'tree.iter()' or 'list(tree.iter())' instead."
with tm.assert_produces_warning(
FutureWarning, check_stacklevel=True, raise_on_extra_warnings=False
) as w:
pd.ExcelFile(path, engine="xlrd")
assert '"xlrd" is deprecated, use "openpyxl" instead.' in str(w[0].message)
FutureWarning, check_stacklevel=False, raise_on_extra_warnings=False
):
ExcelFile(path, engine=None)


def test_read_excel_warning_with_xlsx_file(tmpdir, datapath):
# GH 29375
path = datapath("io", "data", "excel", "test1.xlsx")
# DeprecationWarning: "This method will be removed in future versions.
# Use 'tree.iter()' or 'list(tree.iter())' instead."
with tm.assert_produces_warning(
FutureWarning, check_stacklevel=False, raise_on_extra_warnings=False
) as w:
pd.read_excel(path, "Sheet1", engine="xlrd")
assert '"xlrd" is deprecated, use "openpyxl" instead.' in str(w[0].message)
FutureWarning, check_stacklevel=True, raise_on_extra_warnings=False
):
pd.read_excel(path, "Sheet1", engine=None)