Realistic spatial audio (inspired by highfidelity.com)

metafred · September 3, 2021, 7:27pm

Hi there!

We’re developing a work-from-home solution running on Babylon and webRTC. We are using spatial audio, which works fine, but we would like to go beyond the stereo and try to mimic a more realistic audio experience. It exists some solutions (like highfidelity.com), but they are not compatible with our system (they do the treatment on their server). We are looking for a library to do the audio treatment locally in the browser.

Does anyone have an input?

sebavan · September 3, 2021, 7:53pm

This is an amazing topic where I ll happily learn from the answers as I am really stupid with this topic and maybe we could enhance our audio stack to allow an integration with any of the proposed solutions.

syntheticmagus · September 3, 2021, 8:05pm

I know there’s at least been some investigation into it, but I’ve never gone down the rabbit hole to see exactly what’s been done and how.

Implementing Binaural (HRTF) Panner Node with Web Audio API | Code & Sound (wordpress.com)

+1 to what sebavan said, this sort of thing would be an awesome Babylon feature!

JCPalmer · September 4, 2021, 4:27pm

I am doing a lot with audio. Right now FFT based pitch correction for elements of a voice font.

I have noticed there’s a way to do spacial in the API, but Babylon.Sound is not using it.

One other thing I have done is add reverb to my sound graph, which l am using in place of spacial. This is done with a convolusion node. There are many impulse response files out there. Think I am going with one called concert hall.

JCPalmer · September 4, 2021, 8:25pm

Ok, looked at your question more, and went to your link. I am familiar with most Web audio node types, but never really did anything with webrtc. I do that navigator thing to get a microphone stream, but immediately hook it into webaudio. I collect all the samples to constitute each phoneme for font.

I am not sure what “realistic” technically means. Your link uses the term as well. That might be adequate for a sales pitch, but not if you are going to implement it.

I am sure @saitogroup was asking about conference call stuff in the last week. He may have found something.

Let me ask:

Are ever more than 2 people connected?
If so, how many audio streams does each get?

JCPalmer · September 5, 2021, 2:33pm

@syntheticmagus, I am actually on a desktop now (as opposed to a couch tablet). I checked your link. It is from 2015 (see last line). This is what I saw that is now in the current WebAudio API.

HRTF is now a panning model, PannerNode.panningModel - Web APIs | MDN. I double checked for HRTF in current code, and it is, in fact, conditionally being used. https://github.com/BabylonJS/Babylon.js/blob/master/src/Audio/sound.ts#L483

You just need to:

scene.headphone = true;

I have no idea what that is. @metafred , have you tried this?

sebavan · September 6, 2021, 2:52pm

@JCPalmer could you point me at the new way of doing Spatial ? could be fun to add in

Currently, we simply rely on the HRTF model and pannerNode

metafred · September 6, 2021, 4:53pm

@JCPalmer Thanks for you inputs!

In our platform, we can have more than 2 people in the same time. There’s actually no limit of simultaneous users, it just depends how many users walk close to you (maybe check our project: bublr.co or the attached pic to get a better idea, below).
Each user receives all the streams of the different users around him (within a certain proximity range)

For now we use Audio | Babylon.js Documentation
I had the chance to test highfidelity with one of their salesperson and the experience was so good. When the person is very close to you (in their 2d space), you can really feel that the person whispers to your ear. When they turn their back to you, the sound has some sort-of realistic effect that makes you understand that something (the human body in that situation) is muffling the audio.

brooklyn-office-D|685x500

JCPalmer · September 7, 2021, 5:37pm

@sebavan I can not really find what I was looking at. It was in Dec 2019, based on file stamps of what I was doing at the time. Assume I was mistaken.

I do think that there is a BUG in the Sound constructor. If you pass panningModel
as an option, it never gets transferred to the private _panningModel. That means the only way to get HRTF is calling the undocumented:

scene.headphone = true, fairly early.
calling switchPanningModelToHRTF() on a sound instance

@metafred, your description is gets us a little closer. I think just walking through how you are currently doing things is probably useful. First,what panninglModel are you specifying in the Sound constructor args, or are you defaulting? I am going to assume that panningModel:"HRTF" is the better option.

For the person turned away from you being different from when facing you, requires that sound be directional. That only happens if setDirectionalCone(coneInnerAngle: number, coneOuterAngle: number, coneOuterGain: number) is called. Directional also only works if you attach the sound to a mesh.

Here is where a problem may be. Attaching to mesh turns on spacial if it is not already, but then the panningModel will be default. Turning on spacial with constructor options lets you specify the panningModel, but as I told Sebavan that does not work.

Bottom line, it needs to be absolutely verified that you are both using spacial with both the improved HRTF & directional sound, first.

sebavan · September 7, 2021, 5:57pm

@JCPalmer the panningModel is actually never exposed in the Sound class you can only access it through:

sound.switchPanningModelToHRTF();
// or
sound.switchPanningModelToEqualPower();

and finally it is automatically set to HRTF if spatial audio is on with headphones.

JCPalmer · September 8, 2021, 1:34pm

My mistake. The only mention of HRTF in docs is in a list of properties, the others of whom are options. Perhaps mentioning explicitly on the ways to turn it on might be a good add.

@metafred, it is not likely that you were testing with HRTF. Changing to that is probably your best option, at least for those wearing headphones / earbuds. Probably a pretty safe bet for work at home. That and getting good angles for setDirectionalCone() will let you see how close you can get for as little as possible.

metafred · September 8, 2021, 3:42pm

Great, I’m not sure what we are using. I’ll check HRTF then. Thank you!

labris · December 26, 2021, 3:52pm

@metafred Did you have any success in implementation?

metafred · December 29, 2021, 3:52am

ultimately, we stayed with the default audio system, because with Babylon audio system does allow us to use the Echo Cancellation system from the browser.

this being said, this week we’re testing this solution, https://atmoky.com/. still working on it. I’ll keep you posted

labris · December 29, 2021, 4:06am

Thanks, it would be really helpful to know your experience about atmoky.

carolhmj · January 26, 2022, 12:21pm

That’s a very interesting discussion, even if a lot flew over my head because I have no experience with audio
@metafred I’m curious, did atmoky work out?

metafred · January 26, 2022, 1:10pm

hi carol
we’re actually still working on the implementation.
their demo is very convincing, we can adjust the reverb and other parameters in a nice way, but implementing takes time and we’re on multiple fronts.
I’ll keep the community knowing how it goes

Topic		Replies	Views
Properly Setup Spatial Audio Questions	25	241	March 14, 2025
Browser audio cancellation doesn't work using babylon's sound component Questions	12	1354	November 14, 2022
Spatial audio experience "ProofOfConcept" Demos and projects	6	499	October 29, 2021
Spatial audio is reversed with useRightHandedSystem Bugs	10	844	July 18, 2023
Generative audio integration, whats an ideal method/tool/way to go? Questions audio	7	224	March 10, 2025

Realistic spatial audio (inspired by highfidelity.com)

Related topics