I have a corpus of free form text (emails) and am trying to extract the highest degree (eg. High School, Bachelor, Master's, phD) from each of them. Is stemming the way to go? Or lemmatization? Note that the degree may not be mentioned at all.
Some example emails that I'll need to go through:
Hi. My name is XXXXX. I have bachelor's in Math from UPenn and a Master's of Education from Stony Brook. Would love to work with XXXX!
Dear XXXX, my name is XXXX and am interested in your company. Please email me for CV.