Please check this answerthis answer, which describes a few approaches to the same problem. Given that bird song is a monophonic signal (only one fundamental frequency at any point in time - as opposed to polyphonic) - and given that the timbre is irrelevant, the most interesting feature to extract for this classification task is a pitch contour.
replaced http://dsp.stackexchange.com/ with https://dsp.stackexchange.com/