I have a csv file of more than 100gb and more than 100 columns (with different types of data). I need to know if each column contains the expected data type.
How can I check for each row of the chunk if the data type is the expected one and return a True, otherwise False? (note that since the file is big I open it in parts and all as str):
for df_chunk in pd.read_csv(path, chunksize=n, dtype=str): check(chunk) For example, given the df chunk and the function check:
| A | B |
|---|---|
| 'blue' | 'dog' |
| 'red' | 'cat' |
| 'black' | 1.0 |
check(df_chunk, {A: str, B: str)) Return
| A | B |
|---|---|
| True | True |
| True | True |
| True | False |