You can't just throw away the last 2048 time-domain samples of the circular convolution as they contain part of the convolution results (as least for the length of the impulse response of the filter - 1). For fast convolution filtering, you must save those samples for subsequent overlap save processing, or else the process will be lossy, perhaps severely so.
If you want to use overlapping Von Hann windows, the windows have to overlap by an exact integer sub-multiple of the window width in order for the amplitude envelope of the resulting window sum to be unmodulated (except for the very beginningsbeginning and endsend of the overlap sequence). This works because the process ends up being a linear decomposition and re-composition with a linear operation in the middle (assuming correct overlap add/save fast convolution, and, again, except at the very beginning and very end).