Your performance hit come from driver overhead (state changes). Having 50K VAOs is wayyyyy too much. If all your models are the same format (for example the format x,y,z,r,g,b,a) then you should use one VAO and one VBO. You can use something like `glMultiDrawArraysIndirect` for rendering large amounts of object and you can create commands (which require no OpenGL calls) and then dispatch them all in one go with one OpenGl call to `glMultiDraw*Indirect`. Using `glMultiDraw*Indirect` means you can store all your vertex data in on VBO too. You can also get a preformace win by using glBufferStorage & [persistantly mapped buffers][1] to dispatch you matrix transforms to the GPU. [This presentation][2] show how you can avoid a lot of the driver overhead, along with [source code too!][3] [1]: https://ferransole.wordpress.com/2014/06/08/persistent-mapped-buffers/ [2]: http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead [3]: https://github.com/nvMcJohn/apitest