Your performance hit come from driver overhead (state changes). Having 50K VAOs is wayyyyy too much.
If all your models are the same format (for example the format x,y,z,r,g,b,a) then you should use one VAO and one VBO.
You can use something like glMultiDrawArraysIndirect for rendering large amounts of object and you can create commands (which require no OpenGL calls) and then dispatch them all in one go with one OpenGL call to glMultiDraw*Indirect.
Using glMultiDraw*Indirect means you can store all your vertex data in one VBO too.
You can also get a preformace win by using glBufferStorage & persistantly mapped buffers to dispatch you matrix transforms & colour to the GPU.
This presentation show how you can avoid a lot of the driver overhead, along with source code too!
Also this presentation might be more geared towards what you are trying to do.