Glad to hear it’s going well! For controller-movement-based locomotion, I don’t know of any plans to build something like this into Babylon itself, so you’d probably need to build it into a Babylon Utility yourself. (Stay tuned for more on that term, coming soon. )
Motion recognition is not a super easy problem, so I’d just make sure you go in with the expectation that this is going to be tricky. The heart of the problem is describing motion in a way that will allow you to determine what is and isn’t the motion you’re looking for. The best general solution to this problem that I’ve come across is the Jackknife recognizer, but adopting that for your use case will probably be at least as difficult as building a bespoke system from scratch.
Really, the most important thing to do will be to describe to yourself in words — in exhaustive detail — exactly what motion it is that you’re trying to recognize. This will be crucial in preventing false positives, which can be nauseating to a prohibitive degree if they happen too much in an XR scenario. For example, do you only want to recognize the walking motion if the user’s arms are in a “relaxed position?” If so, you’ll need to reject hand motions that vary too much in the Y axis, but you’ll also have to deal with the fact that for inside-out HMDs (Oculus Quest, Windows Mixed Reality, etc.), your device may not be able to see the controllers very well when they’re hanging by your sides, so tracking may be poor and you may get noise in the detected movement. (Note that the guy in the video is using what appears to be a v1 Oculus Rift, which is an outside-in tracker that will not experience this problem.) Similarly, the guy in the video’s hands aren’t actually moving parallel to each other, they move on semi-linear trajectories that would cross somewhere out in front of him. Is that non-parallel motion acceptable? Is it necessary, such that you should reject motions that aren’t on crossing semi-linear trajectories?
Once you have a deeply detailed verbal description like this, you can set about turning that into a quantitative description that can then be used to build a recognizer. For example, if your description says that the motion involves the controller moving “forward and then backward,” that means you’re looking for a time series where the data first increases, then decreases along the “forward” axis. (Controller forward? User forward?) I wrote a blog post about this process a few years ago that describes this process, then another more recently that also mentions recognizers in passing.
As a final note, I should mention that you can try to use machine learning techniques to avoid having to formally describe the motion you’re trying to recognize in favor of just doing it a bunch of times and letting the computer try to figure it out. I don’t really recommend this approach because (1) it won’t be much simpler unless you’re already more familiar with machine learning tools than you are with sensor fusion and (2) you won’t come out of it with as good an understanding of your own system as you would by diving deep, so when things break they’ll be harder to fix because you won’t really know what made them work in the first place. Nevertheless, it can be done, and it doesn’t have to be done with the most complicated tools: in one of my college classes way back when we had to build decision-tree-based recognizers for detecting walking/sitting/standing movements from smartphone IMU data, so at the very least it’s possible.
Hope this helps, and best of luck!