Sounds like a pretty cool use case. Are your cameras articulated? Or do any of the things you want to add information about move around unpredictably? If not, you might not need real AR (or, more broadly, CV) capabilities at least to get started. If the stuff you want to provide information about is always in the same part of the camera feed, you can just overlay the information over the feed directly and not worry about tracking.
If there is something in the scene you want to provide information about that won’t be in a predictable location on the camera screen, however, the technique you use to track it will depend heavily on what it is you want to track. If, for example, you wanted to be able to see whether a door was open or closed, there are quite a few ways you could tackle that, one of which would be to put a marker (of basically any recognizable kind) on the door and visually checking the state of that marker in the camera feed. However, if you want to overlay information about people, that’s trickier because you can’t put a marker (or any other predictable visual indicator) on people, so you’d need more sophisticated recognizers to attempt something like that. So, in sum, what kind of CV technique you should use really depends on three things:
- Your subject (that is, what your system is trying to perceive in the world).
- How you expect your subject to behave. (Will it be mobile in the frame? Will it move between cameras? Will it always look the same?)
- What constraints your system needs to function under. (Do you need to get results real-time? How many cameras do you have? How much compute power? What other information do you know about the scene apart from camera feeds?)
Sorry if this seems like a more burdensome line of questioning than you might have hoped for; and the good news is that if your scene contains only predictably-placed subjects, you don’t have to worry about any of this because you know where the subject will be. But if the subject’s location is unpredictable, it quickly becomes a trickier question, so I really do need to know more about the use case in order to provide a helpful answer.