Thanks for your help guys! With these recommendations and a few tweaks throughout the code base I was able to cut back the memory usage quite a bit and boost performance.
This is a rewrite of a library I wrote a few years back. Taking advantage of Web Workers, Offscreen Canvas, and writing custom shaders. This is one of the larger samples I test with. It ends up a little over 7 million thin instances of a a cube.
As far as picking goes I used the approach provided in this post GPU picking demo and have had great performance/success with it.