Skip to content

fix: limit chardet < 7.0.0 and pandas <3.0.0#1205

Open
JGSweets wants to merge 6 commits intocapitalone:mainfrom
JGSweets:patch-1
Open

fix: limit chardet < 7.0.0 and pandas <3.0.0#1205
JGSweets wants to merge 6 commits intocapitalone:mainfrom
JGSweets:patch-1

Conversation

@JGSweets
Copy link
Contributor

@JGSweets JGSweets commented Mar 4, 2026

this pr:

  • fixes a breaking change in chardet
@JGSweets JGSweets requested a review from a team as a code owner March 4, 2026 23:25
@JGSweets JGSweets changed the title Update data_utils.py fix: breaking change in chardet Mar 4, 2026
@JGSweets
Copy link
Contributor Author

JGSweets commented Mar 4, 2026

Seems more needs to be evaluated in chardet

@JGSweets
Copy link
Contributor Author

JGSweets commented Mar 4, 2026

looks like they default to windows-1252 more than utf-8 which is why the new error / change is occurring

@shania-m
Copy link
Contributor

@JGSweets is it possible to resolve the failed checks?

@JGSweets
Copy link
Contributor Author

@JGSweets is it possible to resolve the failed checks?

I could alter the tests, but I don't feel like that is the ideal solution.

looking at their newest version, it seems they changed behavior that results in files we originally would label as utf-8 as windows-1252. Not ideal.

In addition, it did not seem like there was an easy way to chunk results into the detector and get a full result rather than a single candidate.

@JGSweets
Copy link
Contributor Author

JGSweets commented Mar 13, 2026

In addition, i think there's a larger issue now at play given 7.0.0
I believe it should be limited to < 7.0.0.

chardet/chardet#327

@JGSweets JGSweets changed the title fix: breaking change in chardet fix: limit chardet < 7.0.0 Mar 13, 2026
@JGSweets
Copy link
Contributor Author

fixed tests similar to the previous pr which have approximate issues

@JGSweets JGSweets changed the title fix: limit chardet < 7.0.0 fix: limit chardet < 7.0.0 and pandas <3.0.0 Mar 13, 2026
@JGSweets
Copy link
Contributor Author

@shania-m should be good to go now!

@JGSweets
Copy link
Contributor Author

@shania-m @ryanSoley any updates here?

self.assertAlmostEqual(
expected_diff["statistics"].get("t-test").get("welch").pop("p-value"),
profile_diff["statistics"].get("t-test").get("welch").pop("p-value"),
places=10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ- I see some of these assertions are at 2 earlier in the file and others are at 10. Do you mind explaining the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The places indicate the precision of the almost equal, in this case 10 decimal places. So the intention here to be as close as possible despite floating point precision issues (possibly due to python versions).

I cannot say why the previous ones were only set to to two places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shania-m if you have any other questions, lmk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants