0

Recently my code broke, like a week ago. Looks like something is going wrong with the openpyxl dependency. Hoping someone else has this issue and can tell me it's not just me being a bad programmer lol

Edit1:

The excel file I'm reading is generated as a .xlsx from Seeking Alpha's Portfolio Excel Export feature.

Edit2:

The Excel Export file now contains an added row with 2 empty conditionally formatted cells to a sheet that I don't even pass through to the sheet_name arg. The problem seems to be that openpyxl can't parse empty cells that have conditional formatting. How can I make read_excel only parse the sheets I listed? Or maybe drop the row that's causing problems?

After adding a "-" or removing conditional formatting, my script works. But I'd like to not have to do this every time I export the excel file. Also, the following warning appears now.

/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py /Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:308: UserWarning: Conditional Formatting extension is not supported and will be removed warn(msg) 

My line of code that causes error

dic = pd.read_excel(path, sheet_name=sheet_names) 

Error message

ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'} 

Complete Log

/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py Traceback (most recent call last): File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 862, in <module> df = excel_data(path_excel, sheet_names) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 129, in excel_data dic = pd.read_excel(path, sheet_name=sheet_names) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper return func(*args, **kwargs) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 344, in read_excel data = io.parse( File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1170, in parse return self._reader.parse( File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 492, in parse data = self.get_sheet_data(sheet, convert_float) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 548, in get_sheet_data for row_number, row in enumerate(sheet.rows): File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 79, in _cells_by_row for idx, row in parser.parse(): File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 145, in parse dispatcher[tag_name](element) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 288, in parse_formatting cf = ConditionalFormatting.from_tree(element) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree obj = desc.expected_type.from_tree(el) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree return cls(**attrib) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/formatting/rule.py", line 201, in __init__ self.operator = operator File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 143, in __set__ super(NoneSet, self).__set__(instance, value) File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 128, in __set__ raise ValueError(self.__doc__) ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'} 
4
  • did you give a try to different engines? sometimes it's an easy way to put the right engine when reading excel. reference pandas.read_excel() (pandas.pydata.org/docs/reference/api/pandas.read_excel.html) Commented Apr 23, 2021 at 0:48
  • 1
    This issue could be related to the fact that some cells in your Excel file use conditional formatting: openpyxl.readthedocs.io/en/stable/api/… Commented Apr 23, 2021 at 16:41
  • @Laurent You're right, thanks for the comment. This was never an issue before, but the excel's creator added a row with 2 empty conditionally formatted cells to a sheet that I don't even pass through to the sheet_name arg. The problem is that openpyxl can't parse empty cells that have conditional formatting. How can I make read_excel only parse the sheets I listed? Or maybe drop the row that's causing problems? Commented Apr 24, 2021 at 8:30
  • Assuming that you are dealing with recent (.xlsx) Excel files, you could try and see what happens with an alternate engine to openpyxl, such as pypi.org/project/pylightxl and pypi.org/project/XlsxWriter Commented Apr 24, 2021 at 14:23

1 Answer 1

1

I have same problem with Seeking Alpha (SA) xlsx. pandas 1.1.5 does read ok but current pandas 1.3.2 has this problem. A possible hack workaround is to install older pandas in a different location and import that pandas version 1.1.5 as pd1 just to import the xlsx . So pd is pandas 1.3.2 and pd1 is pandas 1.1.5 The following has some hints Supporting multiple Python module versions (with the same version of Python)

And use the target option in pip import as explained in Supporting multiple Python module versions (with the same version of Python)

Edited to add sample code Pandas is a large subsystem and multiple versions are a bit demanding to install. A simpler solution was to create a different environment ( I used conda) and execute a command ( .bat in windows .sh in unix)

sample SA_oldpandas.bat

echo old pandas 1.1.5 conda activate env-oldpandas cd {code directory} echo ---GOING---- python SA_OldPandas.py echo ---FINISHED---- %date% %time% start excel SA_check.xlsx 

basically the old Pandas code writes the data to a excel and the new code uses this

os.getcwd() os.system('SA_oldpandas.bat') 
Sign up to request clarification or add additional context in comments.

2 Comments

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
Thanks for the suggestion. It's great that you found this workaround.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.