1

I have a csv file of more than 100gb and more than 100 columns (with different types of data). I need to know if each column contains the expected data type.

How can I check for each row of the chunk if the data type is the expected one and return a True, otherwise False? (note that since the file is big I open it in parts and all as str):

for df_chunk in pd.read_csv(path, chunksize=n, dtype=str): check(chunk) 

For example, given the df chunk and the function check:

A B
'blue' 'dog'
'red' 'cat'
'black' 1.0
check(df_chunk, {A: str, B: str)) 

Return

A B
True True
True True
True False

1 Answer 1

3

Try with type over applymap

df.applymap(lambda x : type(x).__name__).eq({'A': 'str', 'B': 'str'}) A B 0 True True 1 True True 2 True False 
Sign up to request clarification or add additional context in comments.

1 Comment

TIL about applymap. You saved me from posting some shady implementation using apply and a for loop over the columns.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.