Realistic spatial audio (inspired by highfidelity.com)

Hi there!

We’re developing a work-from-home solution running on Babylon and webRTC. We are using spatial audio, which works fine, but we would like to go beyond the stereo and try to mimic a more realistic audio experience. It exists some solutions (like highfidelity.com), but they are not compatible with our system (they do the treatment on their server). We are looking for a library to do the audio treatment locally in the browser.

Does anyone have an input?

This is an amazing topic where I ll happily learn from the answers :slight_smile: as I am really stupid with this topic and maybe we could enhance our audio stack to allow an integration with any of the proposed solutions.

2 Likes

I know there’s at least been some investigation into it, but I’ve never gone down the rabbit hole to see exactly what’s been done and how.

Implementing Binaural (HRTF) Panner Node with Web Audio API | Code & Sound (wordpress.com)

+1 to what sebavan said, this sort of thing would be an awesome Babylon feature!

1 Like

I am doing a lot with audio. Right now FFT based pitch correction for elements of a voice font.

I have noticed there’s a way to do spacial in the API, but Babylon.Sound is not using it.

One other thing I have done is add reverb to my sound graph, which l am using in place of spacial. This is done with a convolusion node. There are many impulse response files out there. Think I am going with one called concert hall.

Ok, looked at your question more, and went to your link. I am familiar with most Web audio node types, but never really did anything with webrtc. I do that navigator thing to get a microphone stream, but immediately hook it into webaudio. I collect all the samples to constitute each phoneme for font.

I am not sure what “realistic” technically means. Your link uses the term as well. That might be adequate for a sales pitch, but not if you are going to implement it.

I am sure @saitogroup was asking about conference call stuff in the last week. He may have found something.

Let me ask:

  • Are ever more than 2 people connected?
  • If so, how many audio streams does each get?

@syntheticmagus, I am actually on a desktop now (as opposed to a couch tablet). I checked your link. It is from 2015 (see last line). This is what I saw that is now in the current WebAudio API.

HRTF is now a panning model, PannerNode.panningModel - Web APIs | MDN. I double checked for HRTF in current code, and it is, in fact, conditionally being used. https://github.com/BabylonJS/Babylon.js/blob/master/src/Audio/sound.ts#L483

You just need to:

scene.headphone = true;

I have no idea what that is. @metafred , have you tried this?

@JCPalmer could you point me at the new way of doing Spatial ? could be fun to add in :slight_smile:

Currently, we simply rely on the HRTF model and pannerNode

@JCPalmer Thanks for you inputs!

In our platform, we can have more than 2 people in the same time. There’s actually no limit of simultaneous users, it just depends how many users walk close to you (maybe check our project: bublr.co or the attached pic to get a better idea, below).
Each user receives all the streams of the different users around him (within a certain proximity range)

For now we use Audio | Babylon.js Documentation
I had the chance to test highfidelity with one of their salesperson and the experience was so good. When the person is very close to you (in their 2d space), you can really feel that the person whispers to your ear. When they turn their back to you, the sound has some sort-of realistic effect that makes you understand that something (the human body in that situation) is muffling the audio.

brooklyn-office-D|685x500

@sebavan I can not really find what I was looking at. It was in Dec 2019, based on file stamps of what I was doing at the time. Assume I was mistaken.

I do think that there is a BUG in the Sound constructor. If you pass panningModel
as an option, it never gets transferred to the private _panningModel. That means the only way to get HRTF is calling the undocumented:

  • scene.headphone = true, fairly early.
  • calling switchPanningModelToHRTF() on a sound instance

@metafred, your description is gets us a little closer. I think just walking through how you are currently doing things is probably useful. First,what panninglModel are you specifying in the Sound constructor args, or are you defaulting? I am going to assume that panningModel:"HRTF" is the better option.

For the person turned away from you being different from when facing you, requires that sound be directional. That only happens if setDirectionalCone(coneInnerAngle: number, coneOuterAngle: number, coneOuterGain: number) is called. Directional also only works if you attach the sound to a mesh.

Here is where a problem may be. Attaching to mesh turns on spacial if it is not already, but then the panningModel will be default. Turning on spacial with constructor options lets you specify the panningModel, but as I told Sebavan that does not work.


Bottom line, it needs to be absolutely verified that you are both using spacial with both the improved HRTF & directional sound, first.

@JCPalmer the panningModel is actually never exposed in the Sound class you can only access it through:

sound.switchPanningModelToHRTF();
// or
sound.switchPanningModelToEqualPower();

and finally it is automatically set to HRTF if spatial audio is on with headphones.

My mistake. The only mention of HRTF in docs is in a list of properties, the others of whom are options. Perhaps mentioning explicitly on the ways to turn it on might be a good add.

@metafred, it is not likely that you were testing with HRTF. Changing to that is probably your best option, at least for those wearing headphones / earbuds. Probably a pretty safe bet for work at home. That and getting good angles for setDirectionalCone() will let you see how close you can get for as little as possible.

1 Like

Great, I’m not sure what we are using. I’ll check HRTF then. Thank you!