Creating and animating a skeleton from joint locations

Hi All,

I have a human figure mesh that I need to animate with a stream of data.

The issue is, the data I am receiving is 16 or so vector 3’s for each frame, each corresponding to the position in space of a “joint” or point on the mesh - not a transformation matrix defining the location, rotation and scale of a bone, as Babylon Animations are set up.

So, there is a point on the nose, and it is connected to points on each of the ears, which I guess the positions of all three points combined will determine the rotation and position (and scale…) of the head - but i dont have the information to transform a bone (or rather set its transformation). Similarly the next pint connects to both hips separately, rather than a central spine.

So I’m looking for advice on how to handle this - I need to create the mesh and skeleton to receive this data and react to it, ie weight paint the mesh and so on.

Some more specific questions I have:

  • can I transform a bone given its start and end point in Babylon, and generate the whole matrix?
  • do i need to do this, or can i just treat the points individually pulling about the mesh in space according to the weiht paint?
  • if these conversions into matrices are needed, would they be very heavy performance-wise if run on every frame?
  • can i even animate this way, streaming movement data to a skeleton, ie without using Babylon’s Animations?

Many thanks :slight_smile: :bone:

cc @PatrickRyan

@Chrisor9, without spending some significant time digging into the best way to accomplish this, I can offer some thoughts that may help you take the next step.

  • I assume you are speaking about a head tracking solution and you are able to track feature points with nose and ears and this is how you are getting positional information about the points. I’m not sure if you are using an assumption for the delta in distance from the camera to the nose and the camera to each ear, or if there is some other way (depth camera for instance) that you are tracking world positions.
  • If you have those three points in world space, I would turn them into a triangle from which you can derrive a normal vector. based on the feature points you are tracking, there may be a slight difference in height of each point, but let’s assume the triangle is relatively parallel with the ground plane when the head is in rest position.
  • As your data streams in, you will need to update your vertex positions for your triangle and recalculate the normal vector, but since you only have three vertices, it probably isn’t a large overhead.
  • You can then take the delta between the normal vector and world Y axis and pass that rotation to your neck bone rotation for the X and Z world axis (your bone orientation will determine which axes are updated with these rotations).
  • For the Y axis rotation you will likely need to use the X position of the nose feature point, assuming the world axis is in the middle of the triangle, and create a vector from the nose point to the world axis. This should give you the delta for the Y world axis rotation to drive your bone with.

There are several assumptions made here without seeing the data being fed to the scene. You will likely need to quantize some of the data depending on how fast the scene gets data to reduce some noise in motion. You also may need to do some interpolation of the pose data if you are getting frame updates slower than per frame to make sure any spike in positional change of your feature points doesn’t snap the head around quickly.

Since you already have a skeleton with weights painted, you shouldn’t need to worry about the individual vertices, and you should just be able to dive rotation on the appropriate bone in the skeleton. This way, you aren’t needing to touch the matrices, but are just leaving the skeleton to do its thing and rather than driving the individual bones with rotational data from a file, you are just setting that rotation per frame.

This was the first solution that came to mind, but maybe @bghgary has other thoughts on this as well.


Thanks a lot, im just processing your answer but first wanted to clarify a few things that may make my situation different to how its been understood:

  • Its not just head tracking, its for an entire body, so there are 16 points of data to be updated (2 x ankle, wrist, knee, hip, elbow, shoulder, ear, plus the neck and nose), as well as an overall world position for the entire figure
  • I’m not in control of the data im receiving, so i dont know about camera position or how exactly it is being tracked - I presume from quite far away as its in fact multiple people at once

sooo, yes considerably more than 3 vertices to be updated every frame if Im using this method, so i guess the overhead will be large

Actually I dont have this skeleton or weight painted mesh - I have a generic mesh which i will need to apply weight paint to, as well as rig, so I’ll be doing this in Blender

So the data is an array of 16 Vec3’s, so looks something like this:

  //Right Shoulder
  //Left Shoulder
  // .... and so on

So my question now - if i was to simply changed the position of a bone that was connected to another one, would that update the rotation and scale automatically? For example with the hip, if i were to change the position of the left hip bone to be in front of the right, would the fact that the right hip is staying in position essentially rotate the ‘bone’ drawn between them?

(I think I am struggling with the overall conception of bones as being points in space, or things that have length and dimensions, or transform nodes that are just matrices, so my mental model is getting a bit cluttered…)

@Chrisor9, thank you for the extra context for the system you are working with. I assumed it was more a head and shoulders tracking solution since you were mentioning nose and ear feature points. With a whole skeleton, it does become a different problem.

In looking at your example for positional data, one thing stands out to me and that is the fact that all of the world positions are at the same z depth in the scene. Maybe this is not the actual data you are getting and just an example so this next assumption may not be accurate. But it appears the nose position and the shoulder position are coplanar to one another which will make solving for a skeleton a bit difficult without any further extrapolation of the data. If we were to look at a sample skinned avatar from the side:

You can see that the skeleton joints are not coplanar to one another. You may find the skeleton with joints coplanar to one another while in T-pose, but that isn’t a natural position for a body. And tracking the nose feature point as coplanar to the shoulders won’t help solve the pose.

Basically, you need to determine the angle between all joints starting from the root and moving outward each frame and then assigning those rotations to each bone. You can’t just use the positional data each frame because a skeleton relies on rotation of each bone rather than translation. You would use translation or scale on individual bones to do cartoonish animation like squash and stretch, but anything realistic relies only on rotation of the joints in the body. And if you need to drive scale, you would scale at the root joint only which will serve to scale the whole skeleton in proportion.

Here’s a visual example of the difference using that same sample avatar. Knee Joint:

Translating the knee joint does “rotate” the parent relationship between the hip and knee, but the mesh deformation is wrong:

Instead, you would need to figure out the angle between the hip and knee positions on this frame and apply the delta angle rotation to the hip:

This important part here is the orientation of the bone drives the mesh skinned to it realistically. Only having positional data requires a solve to determine the actual rotation of the bones in the skeleton. You will need to determine this from the root for each bone on each frame.

I will say that we worked with a team that was using MediaPipe to drive real time body tracking from a camera and rendering them in Babylon.js. This is doing something similar to what you describe and may give you some ideas on how they approach the problem. If nothing else, reading through their process may give you some ideas of what your solition may need. I hope this helps, but feel free to keep asking questions as they arise.

1 Like