Sending Base64 images from createScreenShot() to a backend Python service

I am using Babylon as the front-end simulation environment to initially train some AI models for robotics applications. Right now images are captured by Babylon Cameras using createScreenShot(), encoded to Base64, and sent via AJAX calls to Python. Then, in Python, AI processing and ultimately training are done. Finally, data is sent back to Babylon to update the simulation state. Both the front end (babylon) and backend (python) are hosted on the same local server.

Base64 strings work to send the frame data between BabylonJS and Python, but it just feels non-optimal to send such a big string. Maybe I am just being fussy. I am wondering if there may be a more performant way of encoding or otherwise optimizing the data flow out of Babylon into a backend running something like Python?

You could send the picture as .png or .jpg binary data, as I assume there are some packages on Python side which can decode these formats? However, decoding these formats will take more time than decoding a base64 string, so I don’t know if it will save perf in the end…

Yes, that is about the same conclusion that I had. I just did not know if there was a more efficient way to compress the Base64 for passing it to the backend. With the delay caused by writing out an image file, that option is off the table. This may be the fastest way as anything else would need a reencoding of Base64.

Wild speculation question here:
There is no solution that exists, or can be created, where python could just “see” the image in the front-end computer’s Babylon frame buffer, or through a virtual screen, correct and work directly with that? That way the image did not need captured on the front-end device, encoded to Base64, transmitted to Python, decoded into an image, then finally fed to the models for processing.

cc @RaananW, but I don’t think there’s any other way than marshalling/unmarshalling the data to communicate between Babylon and Python.

webRTC could allow real time video sharing but it might be quite involved to put in place :slight_smile:

1 Like

Yes, i don’t see a way around streaming the binary data or the base64 string. Base64 will increase the file size compared to the binary data, but it all depends on what you consume on the other side. And I guess this is the main question - what is the final form of data you need in order to consume it correctly. Your frontend can, technically, provide you with a section of the image, or even just a few pixels, saving the (perhaps) unneeded data. Or (and this might take it a bit too far, depending on your use case), you could take all of the generated images and encode a video out of them directly on the user’s browser (ffmpegwasm is a thing, of course - GitHub - ffmpegwasm/ffmpeg.wasm: FFmpeg for browser, powered by WebAssembly).

1 Like

Once in Python it gets passed through OpenCV and PyTorch and some other tools. So, once the (current) base64 string arrives it gets converted to a temporary frame image stored directly in python’s memory to feed the tools.

The reason to do this vs sending a screenshot (binary?) is (so far as I know) the screenshots have to be saved to file on the host machine in Babylon which takes time, re-read for transmission which takes time, saved on the receiving end takes time, then opened up and used which takes time.

If Babylon+Python server are running on the same machine the delay is not awful but still noticeable at each step (especially capture in Babylon) and I would love to reach realtime or beyond realtime simulation speeds. I wish I could just reach into Babylon’s memory allocation of the frame buffer and directly read it with Python. Sadly, that is impossible. So, that leaves exploring how to build the most performant bridge to push image data between them.

In thinking toward the future where I may want to extend this platform so it can run multiple simulation machines each running a Babylon instance to more quickly train AI models, or do collaborative multi-agent robot simulation, then the most performant way of encoding and sending image data out of Babylon instance into a remote Python server becomes even more important.

Both these situations got me curious about how I can minimize the delay in capturing and passing a frame data between Babylon and Python.

i’m not sure why you need to save the file in order to pass it to the server, but it really depends on the application’s architecture. The python server should be able to process a stream of binary data. But that’s just an uneducated opinion, since i have no idea how the application was developed :slight_smile:

Babylon/the browser should be able to provide you with an arraybuffer of the data.

Its also possible I am misunderstanding the feature. How do I get a binary stream of data out instead of a file or Base64? I must have misread the docs as those where the only two options I saw noted.