Normal-Rust, a test of Rust / WebAssembly

I found a 300 line C routine in early 2020, which can raise or lower the pitch of an audio buffer by up to 2 octives without changing the length, written by an Audio professional. I tried back then to translate it into Typescript, but it does a lot of heavy math / bit manipulation. It is just not going to work right without integer types.

I need this to work. It does not have to be fast, at all. It is for my voice font builder tool.

Rust / WebAssembly seemed like the best option, but I had never done anything with it before. I decided 2 days ago to do a small project, to replicate VertexData.ComputeNormals() in Rust. ComputeNormals also needs to pass Float32Arrays, and I could easily check if it worked by comparing output with the JS version.

A small project to figure out the process with verification would mean that I was not trying to do too many things for the first time all at the same time.

This went faster and easier than I even thought, published here GitHub - Palmer-JC/normal-rust: A small Rust / Webassembly project to calculate Normals.

The only downside is it is massively slower. Here is a screen shot of the little test page:

I do perform ComputeNormals a lot, so it would have been nice, if it were actually faster. Just passing what I found out for anyone also thinking about using Rust / WebAssembly.

2 Likes

I am stupid with Rust and Wasm in general but wasm should be faster for this. The 2 things I can thing of:

  1. the compilation option are not set to the fastest output (like a debug build)
  2. you include the time of transferring the memory in your measure which might be the bottleneck.

@syntheticmagus is amazing with WASM things as is @Cedric

1 Like

Have you tried running it native? It looks like you’re using some fancier integration features than I’ve used, so maybe the tools have improved, but my general experience is that trying to do real perf testing and debugging on WASM is miserable. Usually, when I’m making a small WASM utility, I build a test harness on native first so that I can debug and analyze it with all my nice native tools, then I just build that library with Emscripten so that I can package it into a WASM. I don’t know if that’d help, but it might make it easier to look more closely at why the code is running slow because, as seb said, it’d be weird if WASM itself was the source of the slowness here.

3 Likes

@syntheticmagus , yes your process is good for someone who took the time to learn Rust & Emscripten. 2 days from zero to validated in a browser is really good, though.

Actually the -22.2x speed “improvement” is so bad that is kind of good. You can easily do brutal tests & get meaningful results. I was just pleased that it worked, so my real purpose could advance, but since you weighed in, I can dissect till I get to the problem.


Pretty sure I got it on the first slice. I basically removed almost all of the code inside the Rust function, except the local variable declarations & zeroing out of the normals array, which is a single method call. Getting the performance now would check if there was a massive transfer bottleneck.

Running the test again now yielded a +33.4x improvement. The transfer might still be a minor issue, but that big a swing is a strong indicator that it is not the place to be looking.


I am almost positive the issue is with how js_sys::Float32Array is defined. It is great for dealing with the bi-directional transfer, or update without even knowing how to blow your nose in Rust or WASM.

The probable issue is that there is not a Float array which you can just index via [index]. It is a structure which just has a backing byte buffer. It does have 2 methods which I am calling: get_index(), set_index().

For 26,756 faces, that is going to be a lot of function call / stack overhead to pay for:

get_index: 682,284 calls
set_index: 280,944 calls

Or a total of 963,228 calls. that is a lot overhead that JS isn’t paying for. The chances that this is the problem is very high.


On Monday, I am starting work on the pitch shifting work for a back office tool that is never going to see the light of day. Speed is a non-factor. If someone can look at that structure doc and write a macro or something which goes directly at the buffer without the call, I will try it & report. No hurry. I do not need this.

1 Like

Small thing but did you use —release flag when you compiled?

You can also add a .cargo/config file (as in directory/filename-no-extension) with the following to enable simd.

[build]
rustflags = ["-C", “target-feature=+simd128”]

Also using chocolatey to install emscripten is the only way i could ever get it to work, maybe you had given up on that, as i did too before using chocolatey (i use scoop for everything else)

2 Likes

Thanks. I’ll look at this early next week & report.

I just forked and updated it with suggested changes, i didnt send you a PR though - I can if you want though. Here is the repo GitHub - jeremy-coleman/normal-rust: A small Rust / Webassembly project to calculate Normals . I also removed pkg (build output) from the .gitignore so you can just install it from github if you want to see if its any better

1 Like

Thanks. I just cloned yours, then copied the cargo.toml & the .config directory. I get 16x slower pretty consistently. Still way slower than JS.

I have clicked on the source button on the Float32Array doc page. Get a file for the entire js-sys library. May be something I can gleam from.

https://rustwasm.github.io/wasm-bindgen/api/src/js_sys/lib.rs.html#5081

Maybe js-sys isnt needed and it could just use [f32] and [u32] types? Vec is indexable Vec in std::vec - Rust

1 Like

If there is a next move for this little technical evaluation demo / project, it would be actually to still pass the positions as Float32Arrays, a Float32Array to assign results to, and a Uint32Array for indices.

Read the positions & indices into [f32] & [u32], respectively. Do all the work with those and write in a [f32]. When done, copy results. Not a solution, so much as really trying to put my thumb on problem.

Crude, but that is the one advantage of really sucking. It is kind of easy to rule things in & out.

I updated the build again at GitHub - jeremy-coleman/normal-rust: A small Rust / Webassembly project to calculate Normals . Using react and vite so both the js and rust code reloads. No changes to rust code yet

1 Like

In my actual Rust project, I did something which sped up using Float32Arrays. That was declaring things as [f32] & [u32], as you suggested, @jeremy-coleman. I testing that change here as well, and this is now actually faster now, by 1.3x.

I did not originally think something like would directly work, but the Rust wasm-pack utility is also generating a javascript file. If declared as Float32Array in the Rust source file, then functions getFloat32Memory0() & passArrayF32ToWasm0() are generated in the file. Otherwise a addHeapObject() function is generated.

When called, the little bit of javascript code that does the binding will call either passArrayF32ToWasm0 or addHeapObject, depending on how it defined in the Rust source file.

This demo repo has been updated. The amount of work in the call to calculate normals is pretty substantial. I can imagine uses that are not quite so intensive not really worth doing.

In my actual use case, I have a number of bit wise operations and need to control the type of number in the code I ported from C, but I am starting to understand / modify that, and may look to do it in a way where Rust may not be required.

@JCPalmer Hey, I updated my repo too and got 1.4-1.6x for me too. pretty good!. I also made sure simd128 was turned on, but it’s not auto vectorizing :frowning: I put this snippet to check its on, then use vscode wat extension to save the .wasm into text to check for simd intrinsics.

// #[cfg(not(target_feature = “simd128”))]
// compile_error!(“Simd not enabled, please check your configuration”);

I have literally no idea what im doing, but im messing around on godbolt.org to check asm

(here is the compute normal code)

I was doing some research on ways to give compiler hints. This was good:
Taking Advantage of Auto-Vectorization in Rust - Nick Wilcox’s Coding Blog and some other random places around the interwebs suggested giving compiler hints with alignment traits.

These were 2 examples from this reddit thread https://www.reddit.com/r/rust/comments/8uccla/does_rustc_have_autovectorization/

pub fn f(a: &[f32;64], b: &[f32;64], c: &mut [f32;64]) {

    for ((a, b), c) in a.iter().zip(b.iter()).zip(c.iter_mut()) {

        *c = a + b;

    }

}

and

#[repr(align(64))]

pub struct Aligned<T>(T);

pub fn max_array_stable(x: &mut Aligned<[f64; 65536]>, y: &Aligned<[f64; 65536]>) {

    let x = &mut x.0;

    let y = &y.0;

  for i in 0..65536 {

    x[i] = if y[i] > x[i] { y[i] } else { x[i] };

  }

}

not sure those type of hints can be used here bc of unknown input length?

but the example from the first article is promising.

// starting point
pub fn mix_mono_to_stereo_1(dst: &mut [f32], src: &[f32], gain_l: f32, gain_r: f32) {
    for i in 0..src.len() {
        dst[i * 2 + 0] = src[i] * gain_l;
        dst[i * 2 + 1] = src[i] * gain_r;
    }
}

improved to:


#[repr(C)]
pub struct StereoSample {
    l: f32,
    r: f32,
}

#[repr(transparent)]
pub struct MonoSample(f32);

pub fn mix_mono_to_stereo_3(dst: &mut [StereoSample], src: &[MonoSample], gain_l: f32, gain_r: f32) {
    let dst_known_bounds = &mut dst[0..src.len()];
    for i in 0..src.len() {
        dst_known_bounds[i].l = src[i].0 * gain_l;
        dst_known_bounds[i].r = src[i].0 * gain_r;
    }
}

it still has the “for i in 0…src.len()” , and just giving inline hints.

1 Like