Since I was unable to find a one stop answer to the problem, I am posting my solution after learning from different threads:
I am importing data using pandas as follows
import pandas as pd data=read_csv(".../file.csv",encoding='utf8') This resulted in the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 352: invalid start type To counter this when I changed the encoding to Latin-1
data=read_csv(".../file.csv",encoding='Latin-1') This resulted in the error when trying to apply vectorizer.fit_transform()
ValueError: np.nan is an invalid document, expected byte or unicode string