How do I use deterministic lockstep to help sync clients?

Hi animation experts,

I’ve read the explanation about deterministic lockstep here

Sometimes it is important to make sure animations, physics and game logic code are in sync and decoupled by frame-rate variance. This might be useful to be able to replay how a scene evolved, given the same initial condition and inputs, or to minimize differences on multiple clients in a multi-user environment

But I fail to understand concretely how I can use it to sync multiple clients.

What I understand is:

  1. The engine in each client needs to be started with the deterministic lockstep option turned on, which is easy enough.
  2. There are onBeforeStepObservable and onAfterStepObservable callbacks that are available to do whatever game logic we want.
  3. In the documentation playground the callback can console.log theScene.getStepId()
  4. Subsequent runs of the playground will console.log a bouncing sphere stopping at the same stepId every time.

Suppose I have multiple clients and I want them to see the same (or close enough) physics animations as they each knock over boxes and shoot bouncing balls etc.

  1. Since each client may have started their own engine a little sooner or a little later, is it important to set each client to the same stepId all at the same time in order to have each client experience the same animations at the same time?
  2. If so, is there a way to set all client’s engines to the same initial stepId? I couldn’t find a single PG example that uses setStepId.
  3. As client A knocks over a box at StepId X (for client A), assuming we can send this web-socket message over to client B, how can we convey to client B that the physics impulse of knocking over the box should have happened back at StepId X, when by the time client B receives the message we’re already slightly past that point in time? Is there a way client B can “catch-up” to the proper animation of the box knocking over? Like saying “this happened 10 frames ago, but don’t bother rendering them, just render where you think the box is now”. Or like saying to your bank account, “these are transactions you didn’t know about, now update your balance”.
  4. Can you ‘rewind’ the engine to an earlier point in time at stepID X, add an impulse that happened, then ‘fast forward’ the engine to the current point in time X + 20 frames and continue rendering the aftermath?

In summary, my intuition is telling me:

  1. Each client is running their own independent physics simulation that happens to be starting around the same time.
  2. Client’s actions can be logged and shared as an array of facts, of physics impulses and forces etc that occurred at stepID.
  3. … that’s sort of where my intuition ends. In what ways can we sync (add facts, that may have already happened) to each client’s engine and expect them to adjust dynamically and smoothly? And does deterministic lock-step help us accomplish this?

Any hints would be appreciated! Thanks!

1 Like

That is a great question and I am sure @Cedric could help when is back next week :slight_smile:

Also @gbz or @MackeyK24 might have already encountered that ?

1 Like

I’d also like to recommend @yannick (creator of geckos and snapshot-interpolation) and @timetocode (creator of nengi) who are very experienced in these areas

I’m still exploring this area and my current thinking is:

Initializing an engine with deterministicLockstep should ideally provide the “same” physics results for clients running at any FPS (e.g. 60, 144). The same physics simulation should take the same number of steps (stepId as you mentioned) for all clients, but the time duration might have a variance of plus or minus a few milliseconds

Your idea of setting all clients to the same stepId seems like the correct approach, but I haven’t found a way myself to do this in a way that’s consistent. As you mentioned, you will always be in the future relative to the server and other clients, and you will always see other clients in the past

I think that the best sources for an overview of multiplayer netcode are Source Multiplayer Networking - Valve Developer Community and Client-Server Game Architecture - Gabriel Gambetta

There’s also Glenn Fiedler’s Deterministic Lockstep | Gaffer On Games and his corresponding GDC talk: GDC - Networking for Physics Programmers by Glenn Fiedler - YouTube

Based on these resources, I’m not sure if we should be trying to synchronize the stepId on all clients and server. However, we still need to find a way to make time consistent on all clients and server, but without allowing clients to hack by faking times (for speed hacks :slight_smile:)

I think we can accomplish the above by allowing all client and server physics simulations to run “independently” of each other (without manipulating stepId). This will allow clients to experience a smooth, responsive game (as mentioned in the Valve resource under Input Prediction)

Clients would send their keystrokes to the server, and the server would run its own physics simulation based on these. The server will calculate delta time between client keystrokes itself without relying on a timestamp from clients (I’m experimenting with this idea now. The pro is that clients can’t hack time, and the con is that clients and the server may have slightly different delta times, breaking determinism)

As you mentioned, there will be a need to “rewind”, as mentioned in the Valve resource under Lag Compensation. The resource also shows the fomula:

Command Execution Time = Current Server Time - Packet Latency - Client View Interpolation

A critical piece of information that you can calculate server side is the latency, which you can do via WebSocket or WebRTC (i.e. server sends a packet to the client and calculates how long it takes for it to receive the client’s response). With this, you can rewind the server to approximately what the client was seeing in the past, execute any game logic (e.g. reward the client with points for aiming correctly), then undo the rewind and continue the server game loop

For a smooth, responsive game, I’m not sure if it is possible to have all clients running the same stepId at the same time. As you mentioned, if Client A knocks over a box, Client A should see the box move immediately. However, other clients cannot possibly see the box move until Client A’s keystrokes arrive at the server, which then broadcasts the physics result (snapshot) to all clients. Furthermore, as mentioned in the Valve resource under Entity Interpolation, there will be an additional delay on top of latency until other clients see that Client A knocked over the box. This is due to client machines “buffering” a few server snapshots, since you need at least 2 of them to smoothly interpolate between (for smooth movement of other players, otherwise players would be constantly teleporting)

I have not fully implemented this myself yet, so there may be unforeseen issues with this approach

Edit: I wasn’t able to answer @owen’s question below, but now I understand how important this is. Without this capability, I’m unsure how to do server-side rewinds and client-side corrections. Previously, I was not using a physics engine, so doing rewinds and corrections were straightforward

1. Can you ‘rewind’ the engine to an earlier point in time at stepID X, add an impulse that happened, then ‘fast forward’ the engine to the current point in time X + 20 frames and continue rendering the aftermath?

In the case of client Input Prediction and Correction, maybe this can be done by keeping memory of past positions per set of input keystrokes (referred to as Command in the Valve resource), forcing all impostors to take those past positions (rewind for correction), and then running executeStep on your AmmoJSPlugin with _useDeltaForWorldStep = false multiple times until you reach the present time

For example, let’s imagine that the server has processed Command #5 from Client A and Client A receives the snapshot result from the server corresponding to Command #5. At this time, Client A is already past Command #5 (for example, Command #10). Assuming that the server’s result from Command #5 is different from Client A’s result from Command #5, Client A will be forced to take and rewind to the server’s result (e.g. player’s and box’s position) or at least a fraction towards it (as mentioned in exponential correction smoothing in Networked Physics (2004) | Gaffer On Games). Client A will then run calculations for Command #6 to #10 (present time for client) with the server’s result for Command #5 as the starting point

3 Likes

Great post @gbz, I think you covered most of the important points.

There are a couple of things that I wanted to add/stress:

  1. You absolutely need a server that is the source of truth for the state of the game. You mentioned web socket messages between clients, but in fact they must go via the server as each client has a slightly wrong version of the actual game state.

  2. Games like this always have glitches, it’s impossible to avoid. If player A and player B simultaneously hit the same box but in opposite directions, they will have to see it initially start to fall in the direction they hit it, but then there has to be a correction. Commercial multi-player games design the gameplay to minimize or eliminate these glitches - i.e. don’t design the game in a way where two players can interact with the same object at the same time in ways that cause very different effects.

Good luck with your game, it sounds like fun!

1 Like

Hi @owen

I second what @gbz said in his great post. Even with locked time step, physics engine will not be 100% deterministic. You have to set a server as the owner of truth and sync impostors position/rotation/velocities from the server to the clients. Or the physics simulation will diverge after some time.

On the same trend, player position/velocities and other events will have to be created and disposed on the server and the clients will interpolate if necessary.

2 Likes

:slight_smile: Something that is kinda stumping me:

According to the Babylon Inspector, it takes ~5 ms to simulate a physics step:

Each physics step simulates a delta timestep of ~16 ms by default (60 FPS)

So all seems fine if it only takes ~5 ms of real time to simulate a physics step with delta timestep of ~16 ms

However, suppose the client disagrees with a server result. The client will have to “rewind back in time”, accept the server’s result, and re-simulate multiple physics steps to get back to the present. The amount of time to rewind is at least the ping + the time the server takes to compute the result

Thus, if it takes ~5 ms of real time to simulate a physics step, we can re-simulate at most ~16 / 5 = ~3 frames without interrupting the game’s smooth flow

3 frames corresponds to a total delta timestep of ~48 ms. A ping of 48 ms is already considered very good, so there is no hope of smooth rewinding for all users with an expensive ~5 ms step simulation time

Glenn Fieldler refers to this as the Spiral of Death (when your physics simulation falls more and more behind the present)

Fortunately, I added 200 dummy yellow box physics impostors (with collision masks and filters) to reach the expensive ~5 ms step simulation time. So it may be possible for a decent game to have <1 ms step simulation time

However, a <1 ms step simulation time on my laptop can easily become multiple ms on a weaker desktop or mobile device

It’s clear that we want to limit the number of physics impostors to as few as possible. I think that using compound bodies to merge physics impostors could help with performance, but have not comfirmed this yet

Perhaps there is a shortcut we could take rather than rewinding and re-simulating all physics steps that could still ensure good user experience and prevent hacking