The Battle: Strassen vs AlphaEvolve vs Babylon. The winner will surprise you!

I love how clickbait my title is :grinning_face_with_smiling_eyes:

Anyway! Yesterday, I fell headfirst into a glorious rabbit hole. I wanted to test the new matrix multiplication algorithm discovered by Google’s AlphaEvolve. This shiny new algorithm uses only 48 multiplications, compared to 49 in Strassen’s classic method—and 64 in Babylon.js’s good ol’ matrix code (which I wrote 11 years ago, so go easy on me, alright?).

On paper, I was thrilled. We use an absurd number of matrix multiplications everywhere, and shaving off even a few per operation could be a massive win long-term.

So I whipped up a simple test comparing our naive implementation with Strassen’s—figuring I’d plug in AlphaEvolve’s version later, once I dig it out of their repo:
:backhand_index_pointing_right: Babylon.js Playground

The twist? Babylon.js was significantly faster than Strassen.

Wait, what? How is that even possible?

Here’s why:

algorithm multiplies adds/subs
naïve 4 × 4 64 48
Strassen 56 ≈ 100 – 120

Strassen only saves 8 multiplications (-12 %) but more than doubles the number of additions and temporary values.
On a modern CPU (or in a JS JIT) an x + y is practically the same cost as x * y, so the “saving” is a wash — and all the extra adds, temporaries and memory traffic become pure overhead.

Also!

Constant-factor overhead dominates

Strassen needs:

  • Ten vector sums/differences (S1…S10).
  • Seven internal 2 × 2 products.
  • Four more vector combinations to build C.
  • ~200 scalar loads/stores versus 128 in the naïve path.

All that shuffling trashes the CPU’s L1 cache and the JavaScript engine’s register allocator. The naïve version, by contrast, is one straight-line block that the JIT can happily keep in registers and even auto-vectorise (SIMD / FMA).

JIT & SIMD friendliness

  • Naïve: tight, predictable access pattern → easy for V8/SpiderMonkey to hoist constants, unroll, and emit SIMD (FMA on x86: c += a * b in a single instruction).
  • Strassen: scattered temporaries, larger code size → less chance of SIMD and often falls back to plain scalar ops.

TL;DR

Strassen is an asymptotic win, not a micro-optimization. On a 4 × 4 matrix it:

  • Saves only eight multiplies.
  • Adds ~70 more additions and loads/stores.
  • Torpedoes SIMD/JIT optimisations.

So the plain old “schoolbook” kernel stomps it. Congratulations to my naive self!

16 Likes

This is so interesting/informative, thank you for sharing!!!

5 Likes

super cool that you took it for a spin to see real world outcomes .. based on your findings here it pretty much makes eveything about their annoucement useless.

3 Likes

I think it still makes sense for hardware manufacturer if they want to have hardware matrix multiplication because at silicon level an addition is far less expensive than a mul

3 Likes

This was a satisfying read. I think it’s a great opportunity for a post mordem @Deltakosh. I’d love to hear about the thinking behind decisions you made in your “naive” algorithm over a decade ago, how it shaped modern BabylonJS, and perhaps what you’d do differently today.

1 Like

:saluting_face:

Look, on Apple M4:

Ok, ok, relax, it’s just a very late April Fools’ joke.
console.log("Strassen:", delay1 * 0.6356); :see_no_evil_monkey: :zany_face:

When Google’s AI discovered the new ‘magic’ matrix multiplication algorithm, I was instantly curious whether it would provide a real performance boost. I was waiting for a real-world scenario benchmark and now we have the results, delivered directly by the Master himself. Thanks for trying it out and sharing the findings! Once again, Google has proven they can generate a lot of hype around empty achievements.

Humans vs AI
1 : 0

4 Likes

Nice try! At the beginning I was like: oh fuck…

That would not be the first time Apple would be doing things differently :smiley:

2 Likes

Haha. That will be fast: I ported the code from my previous engine :wink:

And for that engine, I was not giving it a lot of thoughts :smiley:

2 Likes