Transient errors where an HTTP response is truncated can't be retried

This is very difficult to repro so I’m afraid I don’t have a playground link, but I’ve encountered some intermittent cases where using the gltfFileLoader with SceneLoader.LoadAssetContainer fails with an exception as follows:

LoadAssetContainer failed for priv/model/Arrow/valkyrie_mesh.glb: Unable to load from http://localhost:8080/priv/model/Arrow/valkyrie_mesh.glb: OK
RuntimeError: Unable to load from http://localhost:8080/priv/model/Arrow/valkyrie_mesh.glb: OK
    at errorHandler (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Loading/sceneLoader.js:524:1)
    at errorCallback (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Loading/sceneLoader.js:188:1)
    at /Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:415:1
    at handleError (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:486:1)
    at XMLHttpRequest.onReadyStateChange (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:537:1)
    at XMLHttpRequest.dispatchEvent (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:72:1)
    at XMLHttpRequest._setReadyState (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:422:1)
    at XMLHttpRequest._onHttpResponseClose (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:637:1)
    at IncomingMessage.<anonymous> (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:571:1)
    at IncomingMessage.emit (node:events:514:28) {
  errorCode: 3000,
  innerError: LoadFileError: Cannot read properties of null (reading 'byteLength')
      at /Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:415:1
      at handleError (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:486:1)
      at XMLHttpRequest.onReadyStateChange (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:537:1)
      at XMLHttpRequest.dispatchEvent (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:72:1)
      at XMLHttpRequest._setReadyState (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:422:1)
      at XMLHttpRequest._onHttpResponseClose (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:637:1)
      at IncomingMessage.<anonymous> (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:571:1)
      at IncomingMessage.emit (node:events:514:28)
      at emitCloseNT (node:internal/streams/destroy:132:10)
      at processTicksAndRejections (node:internal/process/task_queues:81:21) {
    errorCode: 4000,
    innerError: undefined,
    request: WebRequest {
      _xhr: [XMLHttpRequest],
      _requestURL: 'http://localhost:8080/priv/model/Arrow/valkyrie_mesh.glb'
    }
  }
}

I added a bit of extra logging to get the stack trace where that “Cannot read properties of null (reading ‘byteLength’)” is coming from, and found this:

TypeError: Cannot read properties of null (reading 'byteLength')
    at /Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/loaders/glTF/glTFFileLoader.js:522:1
    at /Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:412:1
    at XMLHttpRequest.onReadyStateChange (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/@babylonjs/core/Misc/fileTools.js:532:1)
    at XMLHttpRequest.dispatchEvent (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:72:1)
    at XMLHttpRequest._setReadyState (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:422:1)
    at XMLHttpRequest._onHttpResponseClose (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:637:1)
    at IncomingMessage.<anonymous> (/Users/djn/Code/real-space/node/webpack:/real-space/node_modules/xhr2/lib/xhr2.js:571:1)
    at IncomingMessage.emit (node:events:514:28)
    at emitCloseNT (node:internal/streams/destroy:132:10)
    at processTicksAndRejections (node:internal/process/task_queues:81:21)

I think I understand what’s happening: The errors are occurring when Babylon is run under node, using xhr2 for HTTP. It’s possible for xhr2 to return a response code of 200 with response=null if the server attempts to send a 200 but the HTTP connection is closed before all the content is downloaded (see _onHttpResponseClose in xhr2.js). Babylon’s FileTools currently calls through to the onSuccess handler with data set to null when this happens. At least in the case of the gltfFileLoader that isn’t expected and so the code throws an exception “Cannot read properties of null (reading ‘byteLength’)”. FileTools does have code whereby a retry handler can choose to retry transient errors, but it doesn’t do so for cases where the onSuccess handler has thrown an exception (which makes sense - that would usually be a code error and not good to retry).

I have a suggested fix which is to change FileTools to check for null response and, in that case, don’t call onSuccess (which might throw) but instead fall through to the retry logic which can decide whether or not to retry. If it’s OK I’ll open a pull request with that suggestion.

Here’s a PR with a possible fix: If HTTP connection closed prematurely, consider retry by djn24 · Pull Request #16025 · BabylonJS/Babylon.js · GitHub

By itself that doesn’t solve the problem I encountered, because the default retry handler won’t retry if there is a 200 status code. With this change in combination with making my code use its own retry handler which will retries on response=null though, I’ve logged cases where the underlying closed connection problem happened and Babylon retried and continued successfully.

2 Likes

Thanks for the report and the PR!

I will let @sebavan review it (on next week), as I’m not sure what to do (should we have a flag to opt for the new behavior, to avoid a breaking change, or do we consider it’s a bug in the first place?).

I wondered about that too - could any user of FileTools.RequestFile be relying on null being passed through rather than considered an error and (optionally) retried? I think probably not - the method’s intent is to fetch a file so presumably that’s what the caller wants, and therefore null isn’t going to be useful. I did a quick check to confirm that if the server returns a 200 with no content, xhr2 passes that through as an empty string, not null, so this shouldn’t change the behaviour for a zero-length file for example. That said, happy to adapt the PR to include a switch if that’s preferred.

1 Like