Thanks so much for diving deeper! Yes, that definitely helps me understand better what’s going on. The question actually reminds me of another question from quite a long while back, so it might be worth taking a look at that thread in case there’s any overlap.
One thing to consider is that in use cases like the one you’re describing, the usual recommended course of action is to move and rotate the camera, not rotate the model. Those operations are partly just inverses to one another, so if you can do one you can (mostly) do the other. But by rotating the camera, you only have to deal with manipulating the state on one object, thereby avoiding the issues you mentioned that could arise from trying to rotate a lot of different meshes in a scene individually (though there are other ways to get around that problem too, if necessary).
Probably the easiest way I can think of to get to the behavior you’re looking for is to surround your building with a simplified “hull” and make that pickable instead of the building itself, then use the pick information to determine where the camera should go. This will abstract away certain nuances of the behavior from being mesh- or math-specific and will allow you to control your intended behavior mostly by just changing the hulling mesh.
In the case of the screenshot you pasted above, for example, assuming you only wanted to view it from dominant axes (front, back, left, right, back, etc.), you could “hull” it with a simple box mesh. The box would be invisible, but when the user clicked, it would be the box that received the pick, not the building. You could then just take the picked point and the box’s normal at that point, then move the camera to look at that point from a direction indicated by the normal. (See the linked topics above for examples of things like this being done.) In this case, using the box eliminates the potential for strange, unintended results from clicks hitting precise geometry – the sides of a window indentation, for example – which might have wildly different normals from the surrounding geometry and consequently send your camera logic to the wrong side of the building.
And if you wanted different behavior, all you’d have to do is change the hulling mesh. If, for example, you wanted to be able to look at the mesh from more angles than just the dominant axes, you’d just need to replace your box with a mesh that has more normals, like a capsule. This, without any further changes, would cause the same logic outlined above for dominant axes to power any viewing angle. In fact, almost any viewing behavior in this category could likely be enabled simply by changing the hulling mesh, which would allow you to cleanly reuse your code for any number of different kinds of buildings, or even other objects.
Hope this helps, and best of luck!