Ah thanks @imerso, and I appreciate your own contribs.
I was thinking of directly proposing changes here anyway, so here goes:
With reference to the 32bit
When I began this investigation, graphics hardware was pretty much limited to 32bit, but it was starting to change. Now one can run doubles on cards like nvidia (see Ampere and CUDA-X). I have not used these APIs myself so cannot say much, however, one scientist reported to me he used all doubles on GPU for some simulation and it was slower than single precision.
My emphasis on SP now is related more to performance and the fact that floating origin can at the same time maintain high accuracy.
- fp32 can yield double the vector bandwidth of fp64: between cpu/compute unit and memory/cache.
- fp32 functions take less space and therefore can yield greater cache persistence, which in turn delivers better performance.
edit: Please change “The original article by Chris Thorne uses single-precision floats” to
“The original implementations by Chris Thorne used single-precision floats”
I originally planned and designed for a hybrid of double and single precision but want to completely exhaust all fp32 solutions before adding fp64 where it can enhance the lower precision code. There is a very important motivation for this: throwing fp64 at jitter problems early produces good results but hides the problems causing jitter. If I had taken this approach I would not have gained an early understanding of distant relative jitter and loss of degrees of freedom (demonstrated by the interaction “sliding” effect in see: Dimensional collapse bookmark and speed issues: Speed perception bookmark) and I would therefore have not developed general purpose solutions to these by now.
Feel free to include any of my text in the article yourself: this should a collaborative effort, after all.