The process is:
Download geometry & textures. You are actually downloading the geometry more than once. The downloads themselves do actually occur in a browser background thread, but the geometry is parsed & loaded into the GPU in the UI thread. Doing the smaller version first does bring up the scene quicker by the difference in download time though.
Once a texture file is downloaded. It is processed both on the cpu & gpu (creating mipmaps & compiling shader). Right now, for most browsers, the gpu time cannot be done in a thread, but there was WebGL extension to compile shaders on worker threads added. Chrome has it implemented I think, so maybe see if there is a difference between browsers.
Also, it might be less work to not have a material on the larger versions, but rather assign the material from the small version in the swap. Do not know how much it will help, but should be fairly easy to try. Measuring might be tough though, since async is pretty hostile to that.
Finally, you might not want to hear this, but you might have too much detail in your geometry, and are trying to fix everything later right on the user’s machine.
I use Blender. What I do when getting an asset ready is keep a copy (a .blend file) with everything at maximum resolution. I create a series of exports where I apply what is called in Blender a ‘Limited dissolve’ using UV as a delimiter. This function takes an angle to limit as to what is fair game to merge. I start with a small angle and keep increasing until you can see real differences as the expected distance from the camera the mesh is going to be. Then use the one just before.