you won't be able could cull backfaces any where near as quick as the GPU. especialyEspecially if you are modifingmodifying or recreating the vertex buffer otor index buffer on frame-by-frame basis.
I would either;
turn off backface culling (at least for the boxes) and live with the extra triangestriangles being drawn (it'll still be much, much faster than your custom backface culling)
fix your winding order