29

How to assert that the following two dataframes df1 and df2 are equal?

import pandas as pd df1 = pd.DataFrame([1, 2, 3]) df2 = pd.DataFrame([1.0, 2, 3]) 

The output of df1.equals(df2) is False. As of now, I know two ways:

print (df1 == df2).all()[0] 

or

df1 = df1.astype(float) print df1.equals(df2) 

It seems a little bit messy. Is there a better way to do this comparison?

10
  • 6
    NumPy for help : np.allclose(df1,df2)? Commented Jul 5, 2016 at 21:05
  • 3
    @Divakar np.allclose(df1, df2) works for this case. But what if you have some strings in your dataframes as well? Commented Jul 5, 2016 at 21:10
  • @Divakar, could you please add it as an answer - it could help others in future? Commented Jul 5, 2016 at 21:10
  • @MaxU Hmm I am not sure, was mostly a wild guess. Also, as OP pointed out for strings it might be producing unexpected output? Commented Jul 5, 2016 at 21:14
  • 1
    try this: np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object])) & df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object])) - it's based on @Divakar's solution Commented Jul 5, 2016 at 21:14

3 Answers 3

42

You can use assert_frame_equal and not check the dtype of the columns.

# Pre v. 0.20.3 # from pandas.util.testing import assert_frame_equal from pandas.testing import assert_frame_equal assert_frame_equal(df1, df2, check_dtype=False) 
Sign up to request clarification or add additional context in comments.

2 Comments

With pandas 0.20.3 assert_frame_equal is in the pandas.testing package: pandas.pydata.org/pandas-docs/stable/generated/…
And important to notice, if there is no any output after the execution of assert_frame_equal function, then the two dataframes are equal.
7

Using elegant @Divakar's idea - numpy's allclose() will do the main trick for numbers:

In [128]: df1 Out[128]: 0 s n 0 1 aaa 1 1 2 aaa 2 2 3 aaa 3 In [129]: df2 Out[129]: 0 s n 0 1.0 aaa 1.0 1 2.0 aaa 2.0 2 3.0 aaa 3.0 In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object])) .....: & .....: df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object])) .....: ) Out[130]: True 

select_dtypes() will help you to separate strings and all other numeric dtypes

Comments

0

I am a bit late to the party, but with more modern versions of Pandas you do not need to resort to NumPy's np.allclose() for checking approximate numerical equality. For instance, in recent versions of Pandas (2.2.x at the time of this writing) the assert_frame_equal method supports the check_exact= Boolean option. If it is set to False, then you can control the relative and absolute tolerance of floating-point value comparisons with the rtol= and atol= optional parameters, respectively.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.