Babylon.js + Ollama + Gemma 4 - Open Source 3D generation app

Hi everyone!

I’ve been experimenting with bridging Babylon.js and local AI. I set up a project where a local LLM (Gemma via Ollama) dynamically generates and configures 3D characters based on text prompts (e.g., “Make an angry fire robot”). It took me less than 20 minutes.

Why run the AI locally? It’s completely free (no API costs), entirely private (no data leaves your machine), and works 100% offline. Now with Gemma 4.0 even mid-tier laptops can run some of their models.

How it works: I pass a blueprints.json (containing allowed classes, elements, and my actual texture filenames) to Gemma. It strictly returns a JSON object. Babylon reads this to spawn the right procedural shapes and apply the chosen materials.

Where could this go next? Having an LLM spit out structured JSON directly into a Babylon scene opens up some awesome ideas:

  • Procedural Node Geometry: Prompting “cyberpunk city block” to generate a JSON array of building heights and densities for Instanced Meshes.

  • Babylon.js Editor Plugin: Selecting a mesh, typing “make this a weathered brick wall,” and letting the local AI automatically fetch and configure the right PBR materials.

  • Dynamic Loadouts: Letting the AI pick and attach specific .glb weapons or hats to a rig based on a character’s generated class.

Has anyone else been experimenting with local LLMs driving procedural generation in Babylon? I’d love to hear how you’re using it!

Edit:

Just to add, the app is very basic, slow, and doesn’t generate great characters right now. This was just to see if I could actually set this up and if it works on my laptop. Once I have more time, I will definitely experiment a bit more with this!

12 Likes

cc @RaananW @ryantrem

Super cool! In your case, is Gemma creating the output json (no interaction with your Babylon app), and then you just feed the json output into your custom Babylon app and it has code to load that json? Or is Gemma either interacting with your app, or producing the code of your app?

It just outputs a json for now. Super simple, but I think has some potential.

I will share a repo tomorrow you can check out, it’s a pretty simple set up :slight_smile:

Many ideas to use it for, not enough time :sweat_smile:

So I have simplified the app, and now it creates simple shapes based on commands.

It still makes silly mistakes for example here I asked Gemma to create a Red Box first, no problems there. Then I asked it to create 4 spheres with different sizes and colours. It created 7 :joy:

Here is the link to the repo:

If anyone has any questions feel free to ask!

P.s. I’m using the smallest model of Gemma. You can simply download a better version if you have a good PC, and replace this line of code in the main.ts file:

  `model: 'gemma4:e2b',`
2 Likes

Hi!

I have been tinkering with it for quite a while now. A while ago, I tried generated models with an online service (now defunct), like this:

Issues with this are

  1. All content generated by AI is public domain, courtesy of US Copyright Office. I don’t mind, but businesses do.
  2. There’s already a million open source models available on Sketchfab, so generation for personal use kinda defeats the purpose. At least I’ve been able to find all models I ever wanted.

So I went for a different approach: use existing models to build virtual environments, with local Ollama. Not using Gemma though.
And, I have some mixed results. AI-powered search PoC is just fine, but PoC of world-building part is failure - currently available models are just bad at spatial reasoning. Which I’ve read in a paper only after I tried everything :slight_smile:

There:

Way forward seems to be let LLM handle the natural language, and call hand-written spatial tools. Like in the video chatbot moving towards me, looking at me etc, that’s because they call lookat(user) tool rather than lookat(coordinates). LLM can figure out which tool to use fairly well, and fast, but can’t calculate coordinates.
And as you have seen, it can barely count :wink:

3 Likes

Cool, my in VR world project the user can talk to a disembodied LLM based agent for voice based chatting as well as asking the agent to bring up menus, create objects (from available templates). I plan on adding context aware actions next as many objects like lights, radios, tvs have actions that the LLM then can execute for me when I issue simple voice commands. It has been pretty good at adhering to prompts but many smaller models often have problems of overfitting, trying to do what they just did, hence it could spawn objects on a whim even when I did not ask it lol.

The user can also take photos using a polaroid camera and the image is also sent to the LLM which is multimodal so that it almost instantly comments on the picture you just took just for fun.

I have yet to play with real generation of meshes or indeed more complex tasks like decorating a room but some day I plan to play with that too and see what a local running Gemma 4 is capable to do with a good context and constraints.

I do not have good working avatar system yet but I will definitely play with interaction with them then like Josip seems to have done here. I think spatial awareness can be a problem as he mentions. A smaller model needs to have a seriously constrained context window so it can only see what you want it to see, often possibly removing from the context window (chat log) previous things or it will perhaps those are things it can also interact with. They have no concept of time passing between chat entries unless you make that explicitly part of the context text I think.

Fun times for sure, there are so many things I want to play with with these new tools. I just got Trellis2 working locally through ComfyUI and it is rather fun to create objects from images, although you need to either generate them with low poly count or send the GLB through some automatic optimization pipeline. My dream was to be able to go to a 3D printer model and just say what object it should make and after a while it comes out, but on my 4070 Ti Super it still takes around 5 minutes for it to generate one textured model - and I would have to dedicate a full separate machine for that then for it to actually be useful. I guess integration with paid online tools is also possible to simplify that process but atm I have only played a bit with Tencent Hunyan model online through their free tier and not through APIs. They still take a long time to generate.

(photo of recent slot system so my polaroid images end up in a “slot” in front of the camera so the user can grab them there after - my arcade server was not running so the machine is just stuck on connecting there)

3 Likes

Awesome projects, thanks for including them and telling me about your experience.

I think you are correct, and have been thinking about way to make good use of the capabilities of LLMs based on this.

Wow, this looks super cool :star_struck:

1 Like