Migrated from http://blogs.msdn.com/b/rezanour
We saw the power and usefulness of transforms in the previous primer posts, and we’ll be using them quite a bit in our discussions of physics and graphics. One of the first tasks we’ll need to do is determine how to represent our transformations in code. We’re going to be creating, manipulating, and animating these transformations over time, so they must be easy and efficient to work with. Particularly, we’d like to be able to manipulate the individual components (translation, rotation, and scale) independently. One natural, but naïve, choice is to just use the transformation matrix directly, since that’s the form we’ve seen so far. However, using a matrix directly makes it nontrivial to manipulate and animate. It’s also difficult to retrieve and change independent components of the transform. It’s obvious that we want something better, so let’s take a look at some options.
The next natural choice is to represent translation, rotation, and scale separately. This makes it trivial to manipulate the values independently. However, since most graphics packages require the final transformation as a matrix, we’ll need to combine the elements later. This isn’t very difficult to do, and we’ve already seen how we can do that in previous posts. Translation and scale are trivially represented as 3 element vectors, which maps well to how we use them, including combining them with matrices. But what about rotation? How can we represent that?
One way to represent the rotation is to just leave it as a pure orthonormal rotation matrix. It is easy to combine this form with position and scale, since the rotation is already in matrix form. We can add more rotation to this matrix by concatenating other rotation matrices using matrix multiplication. This is somewhat expensive, but conceptually simple.
However, as more and more rotations are applied, our matrix could start to stray from being orthonormal (due to rounding errors, etc…). If we stray far enough away from being orthonormal, we can no longer be considered a rotation, and applying this matrix to an object may have unexpected side effects like shearing or deformation. Therefore, we need to regularly check and ensure that our matrix maintains the orthonormal property, which is another somewhat expensive task to do regularly.
Finally, we still have the problem of requiring 9 floats (16 if you only support 4×4 matrices) to store the data. Despite all this, rotation matrices are still a common form of storing rotations.
Another common way to represent rotation is by using 3 angles (called the Euler angles) which represent a rotation around the x, y, and z axes. This representation only requires 3 floating point numbers to store, and allows us to easily adjust rotations around each axis independently. When the axes of rotation are the local coordinate axes of an object, the rotations are sometimes called yaw (rotation about y axis), pitch (rotation about x-axis), and roll (rotation about z axis).
While this scheme sounds simple to implement, it has some serious draw backs. First of all, in which order do you apply the rotations? Do you first rotate about x, then about the y, then about the z axis? Or perhaps a different order? You’ll notice that the result is different for each combination. There is also a condition called gimbal lock which can occur when two axis are made collinear as you rotate. For instance, if we first rotate about the y axis, and this brings our z axis in line with where our x axis would have been, then the z rotation would look more like an x rotation, and would likely not give us the result we’re looking for.
Even with these issues, Euler angles are a common way of representing rotations. They are certainly intuitive, which is likely why they are one of the most common representations in 3D modeling tools, game and level editors, CAD programs, and many other software applications where reading and typing in rotation values is necessary. But for our internal representation in code, we can probably do better.
For any set of rotations, the final orientation can be represented as some final axis of rotation and an angle. This is called axis-angle representation. For example, imagine that your head is at the origin, and that you are looking forward along some axis. If you rotate your head about the y axis by 90 degrees to the left, then rotate about the z axis by 90 degrees so that you are facing upward, you end up looking up and to the left, with your chin held high to your left. This final orientation could have also been achieved by using a single axis, which is at a 45 degree angle going in the up & right direction from where you started. Return your head to the starting position, and imagine that there is an axis going up and to the right. Now rotate your head about that axis and you should see that you end up in the same final location as the first pair of rotations.
The axis and angle are normally stored as a single 3 element vector. The direction of the vector represents the axis of rotation, and the magnitude of the vector is equal to the angle of rotation. This keeps the storage requirements minimal, requiring only 3 floats. Unfortunately, adding two rotations in this form is not a simple task. You can’t just add the two vectors together, as rotations aren’t really vectors. This difficulty of combining them makes these less ideal for use in games, as one would normally convert them to another form, combine them, and then convert back.
The final representation for rotations that we’ll consider is the quaternion. Quaternions are a set of numbers (usually said to be in the space H) and are an extension of the complex numbers, and they have a lot of uses in mathematics. For our purposes, however, we are only concerned with unit quaternions. A unit quaternion can be used to represent rotations. For a complete discussion of how unit quaternions relate to rotation, and how you can visualize them using a hypersphere, refer to this wiki page.
Before we get into the mathematics and operations of quaternions, let’s examine why they are a better choice than the options we’ve previously discussed. Firstly, they require only 4 floats to store, which makes them much more compact than a rotation matrix, and pretty close in footprint to the other representations. Secondly, combining quaternions is far simpler than for axis angle, and they represent a continuous space so they have no risk of gimbal lock like the Euler angles. While it’s true that we have to normalize quaternions often to prevent rounding error, the normalization process is simple and much more efficient than orthonormalizing a rotation matrix. Lastly, converting to matrix representation is simple and requires no complex operations, so building our composite transform is trivial as well.
How exactly do we represent a quaternion? What do they look like? The quaternion space, H, is a 4 dimensional vector space and so we can represent quaternions using a 4 element vector. Quaternions are an extension of complex numbers, and have 1 real component and 3 imaginary components. They are written in the form w + xi, + yj + zk. To store them as a 4 element vector, we just put the exponents for i, j, and k into x, y, and z, and use w to represent the real portion.
The usual convention for writing quaternions is to write the 4 element vector with the real number first:
We can also write it as the real component and the vector component:
Unit quaternions can be thought of as a modified axis-angle (axis r, angle theta), where:
The identity quaternion is:
Quaternion addition is just like any 4 element vector:
Scalar multiplication also works as it does for vectors:
Normalizing a quaternion should also look familiar:
Quaternions have a conjugate, which is written as q*. The reciprocal or multiplicative inverse of the quaternion can be defined in terms of the conjugate, and is written as q-1:
Multiplying, or concatenating, two quaternions takes the form:
It is important to note that quaternion multiplication is not commutative, and written and performed like matrix multiplication, from right to left. Finally, rotating a vector by a quaternion can be achieved by using:
where vr is the resulting rotated vector, and vq is the initial vector represented as a quaternion (w of 0).
Converting to matrix form is also very important for us, since we’ll need to do this to build our transform. Additionally, creating quaternions from a given axis and angle will prove to be much more convenient than trying to determine rotation quaternions directly. For instance, if I want to rotate an object about the y axis by theta degrees, it would be convenient to specify my rotation in those terms.
Converting from an axis v and angle theta to a quaternion, which can then get applied to the orientation, is straightforward:
Creating a matrix, which represents the quaternion rotation, takes the form of:
With all of these operations, we can now handle all of our scenarios, and we’ve met our criteria for choosing a representation for rotations. To summarize:
- Requires only 4 floating point numbers to store
- Manipulation is simplified by using axis-angle rotations as input
- Vectors are trivially rotated using the formula above
- Easily converted into matrix form for building transform
- Normalizing to maintain unit quaternion qualities is trivially performed
Given all of these properties and benefits, it’s easy to see why quaternions have become a very popular choice for games to use. As we start to explore the physics engine more in the coming posts, we’ll be using quaternions as our representation of rotation as well, and will be referring back to many of these concepts and formulas for that discussion.