Forced shader unrolling


Hi there I am trying to use a shader to output to an Array using the index of a for loop. However at compile time the length of the array is undefined. This forces the shader to unroll the loop which promptly crashes as it does not know the length.
The example given is bogus to show the issue. Even though the loop counter ensures the data integrity for the output to outArray it still unrolls it.

Is there any other way I can achieve this kind of loop and store the results without it unrolling. As I am actually trying to create a data driven SDF tool and the content of the loop is very long and I definitely do not want it unrolled.

Commenting the indicated line in the loop will prevent the crash.

WebGPU might solve the issue, will try converting the shader and see how it works.

I am not familiar with shader coding and asked my buddy…

What is wrong with my code: … code …

The code you provided seems to have a few issues. Here are the problems I noticed:

  1. Shader Definitions: In the createScene function, you define the vertex and fragment shader names as “basic” when creating the BABYLON.ShaderMaterial. However, the shader code you provided is named “_vs” and “_fs”. You should update the shader names to match the shader code names:

javascriptCopy code

var shader = new BABYLON.ShaderMaterial("basicShader", scene, {
  vertex: "_vs",
  fragment: "_fs",
}, {
  // ...
  1. Array Initialization: In the createScene function, you set the array uniform in the shader using the setArray3 method. However, the setArray3 method expects a flat array as an input, and you are providing a multidimensional array. You need to flatten the array before passing it to the setArray3 method. Here’s the corrected code:

javascriptCopy code

shader.setArray3('array', [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0]);
  1. Engine Initialization: It seems that you haven’t initialized the engine variable before creating the scene. Make sure you have the following code to create the engine:

javascriptCopy code

var engine = new BABYLON.Engine(canvas, true);

With these fixes, the code should work correctly.

interesting I will apply these changes, but actually it works fine just by commenting the code in the map function to set the array data inside the loop.

In regards to the name “basic” it works because that is what is called in the shaderStore.
BABYLON.Effect.ShadersStore[“basicVertexShader”] = _vs;
BABYLON.Effect.ShadersStore[“basicFragmentShader”] = _fs;

Though if not using the shader store would have to use the above change.

Regarding the engine, it has not been initialised by me because it is initialised by the BABYLONJS. playground already. I agree that the code would not work without initialising the engine in normal circumstances.

The problem is essentially caused by having a loop where there are an unknown number of iterations and they all access the same data variable, then it forces the shader to unroll the loop and it can’t because when does it stop?

So even doing a simple loop like:
int total=0;
for (int i=0; i<max(0,someUniformInt); i++) {
providing total was used to do something further in the shader would cause it to crash because it is unable to unroll the loop. I.e it tries but gives up after 60+ iterations as there is no end in site. However it should not unroll the loop because the end value is undefined. If the commands inside the loop all reflect local instances of data i.e are totally independent regardless of the loop iteration then it will NOT unroll it. But because every iteration tries to SET the same variable they cannot run simultaneously therefore it tries to unroll. It only applies when you try to SET shared data.

What frustrates me is that I was trying to use the iteration counter (i) to prevent any conflicts like this. But as the array is shared across iterations and is being set it forces the unroll.

In regards to the example if you go to line 58 (outArray[i] = 0.1;) and comment it the shader works fine and a sphere will pop up on the screen.

By the way this is pretty much a nonsense shader :-). stuff it is doing has no purpose other than to demonstrate that saving data to an array inside a loop forces the unroll event. Interestingly a similar issue does not seem to occur using the compute shaders at least not on the editor and tester I found. Think it was called CSOY but I can’t find it again.

Forgot to say thank you for your reply.

The bug is the i=i++ part line 57.

By doing so, you don’t update i, as it is essentially doing i=i then the post-incrementation of i is lost. You could do i=++i or simply i++ (or ++i).

Well that was very definitely a duh.
moment. You are totally correct and sorting that out does prevent the loop from blowing.

My actual code though is more on the line of:

struct sShaderData  {sFormatData formats[256]; sShapeData shapes[80]; sCombineData combines[200]; sRE re[1024]; sJointData joints[1024];	int numFormats; int numJoints; int numShapes; int numCombines; };

float da(float aPos) {int ap = int(floor(aPos/4.0)); int cp=int(floor(mod(aPos,4.0))); return data[ap][cp];}

sShaderData uncompress() {
	sShaderData shaderData;
	shaderData.numJoints = int(data[1].y);
	shaderData.numFormats = int(data[1].z); 
	shaderData.numShapes = int(data[1].w);
	float offset = 8.0;
	for (int h=min(0,shaderData.numFormats); h<shaderData.numFormats;h++) {
		// plus 8 means formats start at position 9 (index 8)  in data floatArray.
		sFormatData format;
		float forCode = da(offset); //fStyle*64.0+fType
		float col0 = da(offset+1.0); //color.rg
		float col1 = da(offset+2.0); //
		int forType=int(mod(forCode,32.0));
		int forStyle=int(floor(mod(forCode/32.0,32.0)));
		float texID=floor(forCode/2048.0);
		format.forType = forType;
		format.forStyle = forStyle;
		format.texID = texID;
		format.color.x = fract(floor(col0/2048.0)/256.0);
		format.color.y = fract(col0/256.0);
		format.color.z = fract(floor(col1/2048.0)/256.0);
		format.color.w = fract(col1/256.0);
		format.color = shaderData.formats[h].color * colAdjustment; // 255->1.0
		if (forType>0) { // these is surface displacement
			// get displacementData
			// is it a texture or a displacement style?
			switch (forType) {
				case 2: //worley (2) worType is
					format.forData.x =  da(offset); // clamp
					format.forData.y =  da(offset+1.0); // bumpiness
					format.forData.z =  0.0; // ?
					format.forData.w = 0.0; // ?
					offset+=2.0; break;
				case 3: //texture (3)
					format.forData.x =  da(offset); // u
					format.forData.y =  da(offset+1.0); // v
					format.forData.z =  da(offset+2.0); // bumpiness
					format.forData.w = 0.0;  // ?
					offset+=3.0; break;
				case 4: // other function
				// etc..
		//shaderData.formats[h] = format;
       return shaderData;

and turning the comment

//shaderData.formats[h] = format;

on and off is the difference between the loop failing or not with unable to unroll.

You can try to rewrite the loop to something like this:

for (int h=min(0,shaderData.numFormats); h<100; h++) {
  if (h >= shaderData.numFormats) break;

Change the 100 value to something that would be the max allowed value for shaderData.numFormats.

Trouble is this is a minor part of the shader. The rest is commented out. The above would get it to unroll 100times with an early cutout but the full shader is massive as it includes all SDF constructors, Deformation possibilities, SDF texture calculations etc… unrolling it 100times would take hours to compile.
Currently the program actually writes the proper shader with the objects written into the SDF map with only their parameters coming from the data stream, this is fine when the shape is small, but as it gets larger the time take to add another shape to a model goes from seconds to minutes to tens of minutes. Which is unacceptable.
Hence trying to make a data driven shader. But to unroll it is unfeasible as the code inside the loop is very large just for a single iteration, let alone 100 times. Hence I need to try and force it to NOT unroll, but having a very hard time achieving this. I know that not unrolling would have a performance hit, but it should be quite minimal in comparison to waiting 20mins+ for you finalised shape to appear and everytime you add another shape you have to wait again. Not fun.

The full map function (without providing all the function definitions is like):

param mapColor(vec3 p) {
sShaderData shaderData;
// Initialise re[0] // STILL NEED param as colours blend!! So format index as a result is wrong![0].r = param(vec4(1.0,0.0,0.0,1.0), 0.0, 1e5); // color, groupID, dist[0].e = vec3(0.0, 0.0, 0.0); // E# value for shape

// calculate all joints! In relation to p.
for (int i=0;i<shaderData.numJoints;i++) {
	int i2=2*i;
	shaderData.joints[i].position = opRotate(p-transform[i2].xyz, transform[i2+1]); // J# in prev code
	// transform[i2].w i.e position.w hold Joint Worley Factor.
	if (transform[i2].w!=0.0) {
		// calculate joint worley in relation to p if necessary
		shaderData.joints[i].worley =  worley(p, transform[i2].w);

vec4 size=vec4(0.0,0.0,0.0,0.0); //w is perimiter
// process shapes
for (int j=0; j<shaderData.numShapes; j++) {
	// what do we get?
	int sJ = j; // this is just in case the shader get far enough ahead in parallel that it is doing another loop
	sShapeData s = shaderData.shapes[sJ];[sJ].e = shaderData.joints[s.jointID].position;

	// evaluate positioning! With parent joint.
	// if shape position away from joint
	if ((s.flags&1)==0)[sJ].e =[sJ].e - s.position;
	// if shape rotation (check flags)
	if ((s.flags&2)==0)[sJ].e = opRotate([sJ].e, s.rotation);
	// store position before distortions
	// apply shape distortions if applicable
	if (s.distType>0) {
		vec3 v = s.distParams;
		switch (s.distType) {
			case 1: //opPushX[sJ].e = opPushX([sJ].e,v.x); break;
			case 2:	//opPushY[sJ].e = opPushY([sJ].e,v.x); break;
			case 3:	//opPushZ[sJ].e = opPushZ([sJ].e,v.x); break;
			case 4:	//opElongateX = vec3(v.x,0.0,0.0);[sJ].e = opElongateX([sJ].e,v.x); break;
			case 5:	//opElongateY = vec3(0.0,v.x,0.0);[sJ].e = opElongateY([sJ].e,v.x); break;
			case 6:	//opElongateZ = vec3(0.0,0.0,v.x);[sJ].e = opElongateZ([sJ].e,v.x); break;
			case 7:	//opElongateXY = vec3(v.xy,0.0);[sJ].e = opElongateXY([sJ].e,; break;
			case 8:	//opElongateXZ = vec3(v.x,0.0,v.y);[sJ].e = opElongateXZ([sJ].e,; break;
			case 9:	//opElongateYZ = vec3(0.0,v.xy);[sJ].e = opElongateYZ([sJ].e,; break;
			case 10:	//opElongateXYZ =;[sJ].e = opElongate([sJ].e,; break;
			case 11:	//opTwistX[sJ].e = opTwistX([sJ].e,v.x); break;
			case 12: 	//opTwistY[sJ].e = opTwistY([sJ].e,v.x); break;
			case 13:	//opTwistZ[sJ].e = opTwistZ([sJ].e,v.x); break;
			case 14:	//opRound?

	// evaluate shape
	vec4 w = s.sdfParams; //assignment causes copy to be made is accessing copy quicker than via full path object?
	float r; 
	switch (s.sdfType) {
		case 0:	// roundedConeAtoB for skeleton. Capsule AtoB not included as only in tool shaders!
			r = sdConeAtoB([sJ].e, shaderData.joints[int(w.x)].position, shaderData.joints[int(w.y)].position, w.x, w.y);
		case 1:	r = sdSphere([sJ].e, w.x); break;
		case 2:	r = sdBox([sJ].e,; break;
		case 3:	r = sdEllipsoid([sJ].e,; break;
		case 4:	r = sdCone([sJ].e, w.x, w.y, w.z, w.w); break;
		case 5:	r = sdPlane([sJ].e,; break;
		case 6:	r = sdCapsule([sJ].e, w.x, w.y, w.z); break;
		case 7:	r = sdPyramid([sJ].e, w.x, w.y); break;
		case 8:	r = sdSteele([sJ].e, w.x, w.y); break;
		case 9:	r = sdRhombus([sJ].e, w.x, w.y, w.z, w.w); break;
		case 10: r = sdBoundingBox([sJ].e,,w.w); break;
		case 11: r = sdTorus([sJ].e,w.xy); break;
		case 12: r = sdLink([sJ].e,w.x,w.y,w.z); break;
		case 13: r = sdHexPrism([sJ].e,w.xy); break;
		case 14: r = sdCylinder([sJ].e,w.x,w.y); break;
		case 15: r = sdCappedTorus([sJ].e,w.xy,w.z,w.w); break;
		case 16: r = sdConeCapped([sJ].e,w.x,w.y,w.z); break;
		case 17: r = sdSolidAngle([sJ].e,w.x,w.y); break;
		case 18: r = sdTriPrism([sJ].e,w.xy); break;
		case 19: r = sdRoundedCylinder([sJ].e,w.x,w.y,w.z); break;
		case 20: r = sdRoundBox([sJ].e,,w.w); break;
		case 21: r = sdRoundCone([sJ].e, w.x, w.y, w.z, w.w); break;
		case 22: vec3 p0 = shaderData.joints[int(w.x)].position; vec3 p1 = shaderData.joints[int(w.y)].position;
						 vec3 p2 = shaderData.joints[int(w.z)].position;	r = sdTriangle(p, p0, p1, p2 ); break;
		case 23: r = sdDodecahedron([sJ].e, w.x); break;
		case 24: r = sdIsohedron([sJ].e, w.x); break;
	// store param data for shape[sJ].r.d = r;[sJ] = 0.0;[sJ].r.color = shaderData.formats[s.formatID].color;

	// apply surface displacement if applicable (inc textures)
	if (shaderData.formats[s.formatID].forType==2) {[sJ].r.d -= sdDisplacementWorley(shaderData.joints[s.jointID].worley,
				shaderData.formats[s.formatID].forData.x, shaderData.formats[s.formatID].forData.y, 
	} else {
		if (shaderData.formats[s.formatID].forType==3) {
			//texturing use vec3 size (contains elongation), p as orgPos.
			//only calculate perimiter and add shape dimensions if needed.
			if (shaderData.formats[s.formatID].texID > 0.0) {
				if ([sJ].r.d < 0.25) { // check against a minimum distance? Add this to data passed in?
					switch (shaderData.formats[s.formatID].forStyle) {
						case 0:
							vec3 n = abs(normalize(orgP));
							float bA = tex3D(orgP, shaderData.formats[s.formatID].forData.xy, n, shaderData.formats[s.formatID].texID).x;[sJ].r.d -= (bA-1.0)*shaderData.formats[s.formatID].forData.z;

// process combines
for (int k=0;k<shaderData.numShapes;k++) {
	sCombineData c = shaderData.combines[k];
	float p0=c.combParams.x;
	float p1=c.combParams.y;
	switch (c.combType) {
		case 0:[c.rOut].r = opUnion([c.rOut].r,[c.rIn].r); break;
		case 1:[c.rOut].r = opIntersect([c.rOut].r,[c.rIn].r); break;
		case 2:[c.rOut].r = opSubtract([c.rOut].r,[c.rIn].r); break;
		case 3:[c.rOut].r = opSmoothUnion(p0,[c.rOut].r,[c.rIn].r); break;
		case 4:[c.rOut].r = opSmoothIntersect(p0,[c.rOut].r,[c.rIn].r); break;
		case 5:[c.rOut].r = opSmoothSubtract(p0,[c.rOut].r,[c.rIn].r); break;
		case 6:[c.rOut].r = opUnionChamfer(p0,[c.rOut].r,[c.rIn].r); break;
		case 7:[c.rOut].r = opIntersectChamfer(p0,[c.rOut].r,[c.rIn].r); break;
		case 8:[c.rOut].r = opSubtractChamfer(p0,[c.rOut].r,[c.rIn].r); break;
		case 9:[c.rOut].r = opUnionColumns(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 10:[c.rOut].r = opIntersectColumns(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 11:[c.rOut].r = opSubtractColumns(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 12:[c.rOut].r = opUnionStairs(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 13:[c.rOut].r = opIntersectStairs(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 14:[c.rOut].r = opSubtractStairs(vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 20:[c.rOut].r = copUnion(,[c.rOut].r,[c.rIn].r); break;
		case 21:[c.rOut].r = copIntersect(,[c.rOut].r,[c.rIn].r); break;
		case 22:[c.rOut].r = copSubtract(,[c.rOut].r,[c.rIn].r); break;
		case 23:[c.rOut].r = copSmoothUnion(,p0,[c.rOut].r,[c.rIn].r); break;
		case 24:[c.rOut].r = copSmoothIntersect(,p0,[c.rOut].r,[c.rIn].r); break;
		case 25:[c.rOut].r = copSmoothSubtract(,p0,[c.rOut].r,[c.rIn].r); break;
		case 26:[c.rOut].r = copUnionChamfer(,p0,[c.rOut].r,[c.rIn].r); break;
		case 27:[c.rOut].r = copIntersectChamfer(,p0,[c.rOut].r,[c.rIn].r); break;
		case 28:[c.rOut].r = copSubtractChamfer(,p0,[c.rOut].r,[c.rIn].r); break;
		case 29:[c.rOut].r = copUnionColumns(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 30:[c.rOut].r = copIntersectColumns(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 31:[c.rOut].r = copSubtractColumns(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 32:[c.rOut].r = copUnionStairs(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 33:[c.rOut].r = copIntersectStairs(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;
		case 34:[c.rOut].r = copSubtractStairs(,vec2(p0,p1),[c.rOut].r,[c.rIn].r); break;


And unfortunately many of the SDF functions are not small especially when texturing is included. Hence trying to prevent the unrolling. It is frustrating because I know graphics cards these days can do this. They need to given the huge data expansion with buffers and the like. But when I get told unrolling is forced it does not tell me why so very difficult to find the source that triggers it.

While I know that not unrolling would give a performance hit in a finalised model, for an editor where the model is changing all the time this is not a problem, but rather the delay in writing the updated shader as the model is edited becomes a huge issue.

For some of my more complex shapes I can have a bath in the time it takes to compile the code every time I add a new shape, or change something that effects the hard coded stuff rather than just the data stream.
This is why I am trying to change it so that everything is controlled by the data stream. The initial compile (if no unrolling) would take a little time maybe a min or two, but after that there would be no need for compiling at all. Thus editing and modifying the shape on the fly would be feasible.

Some of the problem is because I have not yet changed over to using a UBO. So the data is held in a compressed data stream and the initial phase is about uncompressing this.

Will try removing that entire section and have uncompressed data. I will let you know how it goes, if that works then will move over to having data in a UBO as it is far more flexible in regards to size, each SDF and its additional data take a fair bit of space.

For the most part the project works very well, I would love to post it up when I can get this delay thing sorted.

The problem is that I don’t think glsl (in WebGL) provides the possibility to force the compiler to unroll / not unroll some shader code… It is something automatic that you have no control over.

cc @sebavan in case he would know a way.

nope it is all internal and such in angle with poor control over it :frowning:

I would suggest the babylon / three way of inlining with custom code.

Thanks for the advice. Doing some interesting experiments, considerable frustration until I tracked down a shader initialisation issue. It seems the trouble with a data driven shader is that the shader does not change as more data is added.
However I had not padded out (set) all the possible array values when I compiled the shader. I assumed erroneously that because my shader said the array was 100 elements long and I had a uniform to ensure that I did not exceed the amount of data I had actually put into the array, that I could expand the array as more shapes are added.
Unfortunately no.
It uses the data you have set to decide the array size and setting data further into the array after the compile does not work. So much frustration caused by this :stuck_out_tongue: