I want to type-check Pandas DataFrames i.e. I want to specify which column labels a DataFrame must have and what kind of data type (dtype) is stored in them. A crude implementation (inspired by this question) would work like this:
from collections import namedtuple Col = namedtuple('Col', 'label, type') def dataframe_check(*specification): def check_accepts(f): assert len(specification) <= f.__code__.co_argcount def new_f(*args, **kwds): for (df, specs) in zip(args, specification): spec_columns = [spec.label for spec in specs] assert (df.columns == spec_columns).all(), \ 'Columns dont match specs {}'.format(spec_columns) spec_dtypes = [spec.type for spec in specs] assert (df.dtypes == spec_dtypes).all(), \ 'Dtypes dont match specs {}'.format(spec_dtypes) return f(*args, **kwds) new_f.__name__ = f.__name__ return new_f return check_accepts I don't mind the complexity of the checking function but it adds a lot of boilerplate code.
@dataframe_check([Col('a', int), Col('b', int)], # df1 [Col('a', int), Col('b', float)],) # df2 def f(df1, df2): return df1 + df2 f(df, df) Is there a more Pythonic way of type-checking DataFrames? Something that looks more like the new Python 3.6 static type-checking?
Is it possible to implement it in mypy?