MCP for Babylon — let AI agents control your scene

Hey everyone,

I’ve been working on something I’m pretty excited about and wanted to share it with the community.

MCP for Babylon is an open-source framework that exposes your Babylon.js scene through the Model Context Protocol. In practice, it means an LLM like Claude or GPT, or any MCP-compatible agent, can inspect and manipulate cameras, lights, and meshes in your running scene, in real time.

A few things it can do today:

  • Fly a camera to a position, orbit around a target, follow a path with easing

  • Create, remove, and tweak lights (point, spot, directional, hemispheric)

  • Show/hide meshes, change materials, animate transforms

  • Take snapshots, pick objects from the scene

  • Works with both Babylon.js and CesiumJS

It runs entirely in the browser (UMD bundles), talks to MCP clients through a small Node.js WebSocket tunnel, and supports both SSE and Streamable HTTP transports. You can test it with Claude Code or the MCP Inspector in minutes.

The architecture is layered: behaviors are engine-agnostic, adapters are engine-specific, so it should be straightforward to extend with new behaviors or plug into your own setup.

Everything is Apache licensed and contributions are very welcome.

Repo: https://github.com/pandaGaume/mcp-for-babylon

I’m also writing a companion book called Vue Spatiale that covers the broader vision: spatial computing, geographic cameras, 3D Tiles, and how MCP fits into all of this. It’s a work in progress but might give some context on where this is heading.

Book: https://github.com/pandaGaume/vue-spatiale

Would love to hear your thoughts, feedback, or ideas. And if you give it a try, let me know how it goes!

Cheers

PS: I’ve been collaborating with Claude Code throughout the development process and honestly it rocks. Highly recommend giving it a shot if you haven’t already.

update

Just added multiplex capabilities, so several MCP servers can now be set up. A local MCP client is also provided. The use case comes from the University of Houston, which needs to simulate a swarm of robots for lunar and Mars VR experimentation (with VLA support already provided by the snapshot tools). So maybe in Babylon.js soon.

A swarm of rovers communicating within a Babylon.js scene → World Model

15 Likes

cc @amoebachant @alexchuber and @RaananW

I’ve been tinkering with Ollama myself, but it turned out that LLMs are pretty bad at spatial reasoning. E.g. fail at basic tasks like “what am I looking at”. Then they’re gonna explain all the sin, cos and atan of how they came to mix up front and back.
Are big and expensive models and services any better? I suppose you’re using Claude, right?
I’ve been using different mistral variants myself, but according to https://arxiv.org/pdf/2504.05786 they are all just different shades of bad, so I didn’t really bother looking up for better one(s).
… so I gave up scene manipulation after all, or at least for the time being.

Video of my tinkering: https://www.youtube.com/watch?v=qg5NCsW35NQ
All open source, and live demo available on vrspace.org if anyone wants to tinker it with it.

2 Likes