Math Primer Series: Matrices III: Affine Transformations and Matrices

Migrated from

We’ve seen matrices used as representations of a basis, and we’ve seen them used as linear transformations, but what else can we do with them? So far, we’ve only been able to use matrices to transform vectors from one vector space to another. What about points and vectors in an affine space? There is another class of transformation, called an affine transformation, which does exactly this.

Affine Transformations

An affine transform is a transformation which maps points and vectors in an affine space to another affine space. This allows us to transform positions now, in addition to just vectors. An affine transformation can be thought of as a combination of a basis matrix and a translation to an origin, which is analogous to affine spaces compared to vector spaces, since affine spaces add an origin. When we look at an object, we can define a right, up, and forward vector for it, relative to some frame of reference. But furthermore, we can describe it’s position in space. Not the position of a particular point on the object, but the position of the whole object. What we’re actually doing is saying that all the points on the object can be thought of relative to some origin on the object, and that origin has a location in space (relative to some frame of reference of course). This means that even though we move the object’s origin around, we’re not changing the orientation (the right, up, and forward vectors). The reverse is also true, we can rotate or scale our object all we want, but it won’t move the object’s origin. In this way, we see that the origin of the object, and it’s basis are independent, and the formal affine definition reflects this separation. Affine transformations are formally defined as multiplying a point by a matrix component (to define it’s orientation) and then adding a vector component (to define it’s origin’s position):


Here, M is the matrix portion of the transformation, and is sometimes referred to as the linear portion. This can consist of rotation, scaling, or any other linear transformation. The t in the equation is called the translation and is the vector defining where the object’s origin is compared to some frame of reference. Let’s take a look at an example to get a better grasp of this:


In Figure 1, we see a frame of reference shown (origin O, with right, up, and forward defining our frame). We also see an object (the square). If we were locally sitting inside that object, then x and y would have been our right and up vectors, and P our local origin. But since we’re examining the object from a different frame of reference, it appears rotated and located at some other location. This placement and orientation can be described as a single affine transformation, as show in the equation above, with M being the linear portion and t being the translation portion. Using our example above, and assuming only rotation (to simplify), our matrix M would likely look like:


Assuming that x, y, and z (which we can’t see in the image) are defined relative to our coordinate space and are unit length. Look back to this post if you’re unsure how we got that. We then would add our translation vector (dashed line t in the figure).

While this works out well enough, the extra addition must be remembered and there’s not a convenient way to store this information. Fortunately, there’s an alternate way to express and use affine transformations which is much more convenient and consistent with what we’ve already seen:


Note that this is now a 4×4 matrix, as apposed to the usual 3×3 we’ve seen so far. It contains the linear part of the affine transformation in the upper left 3×3 block, and the translation vector as the last column. The bottom row contains 0s with the exception of the last entry which is a 1. Here’s an example of the expanded matrix from the previous example:


Before we go on, I wanted to call out that I’ve been using column matrices so far in this post. The same concepts apply if you’re using row matrices, just remember the layout will be transposed from what I’m showing here, with the translation vector as the bottom row of the matrix, and the 0s as the last column, and transposing the upper left block.

Now, we claimed earlier that affine transformations could be used to transform both points and vectors from an affine space to another. Points and vectors are quite different. One is a location and one is a direction. It doesn’t make any sense to translate a direction, so how can we possibly use this matrix to transform vectors? The answer is in the fact that we have to augment a point or vector to be compatible with this matrix. This is a 4×4, and both points and vectors are 3 components. Without some augmentation, the points and vectors are not compatible with this new matrix. How we augment them depends on whether they are points or vectors, and dictates whether the translation component of the affine matrix is applied or not.

If you recall from this post, points are sometimes written as 4 component vectors with a 1 in the last component (sometimes called w). Vectors can likewise be written with 4 components, with the last component equal to 0. The reasons this worked for arithmetic were shown in the post, but we now see there is another critical reason for that distinction. When you multiply a 4 element vector with the affine matrix above, the math works out to:







Note the highlighted portion of the last one there. This is exactly what you’d get using a normal 3×3 transform M multiplied by p. This means that we get the exact same result as applying the linear transformation M and then adding t times p’s w component. If the w component is 0, as in the vector case, it cancels out the translation portion. If the w component is 1, then we translate by exactly t, which is what we expect for points. Notice also that the last element of the new vector is exactly equal to the w component of the starting vector. This means that vectors stay vectors under affine transformation, and points stay points. This is pretty important, it means that an affine space will never turn points into vectors or vice versa. The augmented points and vectors which contain the w component are said to be in homogenous space, and another term often used for the affine matrix (since it requires homogenous vectors) is a homogenous matrix.

In the next and final matrix-related math primer post, we’ll examine some other operations which can be performed on matrices. Specifically, we’ll take a look at the determinant, adjunct, and inverse. Afterwards, we’ll quickly look at some of the problems with rotations, and introduce quaternions as an alternative way to express rotations which overcomes many of the traditional problems. I’m going to be continuing the physics posts as well.

Posted in Math | Tagged , , , , | Leave a comment

Math Primer Series: Matrices II: Linear Transformations

Migrated from

In the last installment of the math primer series, we looked at the basics of matrices. Today, we’ll take a look at using matrices for linear transformations, which are one of the most common uses of matrices in games. But, before we dive into linear transformations, let’s take a quick look at what a transformation is, and why we represent them using a matrix in the first place.

A transformation is any operation that maps values from one space to another. In the context of standard algebra, we are usually introduced to transformations in the form of functions. Functions and transformations are really one and the same, though the term transformation is more common in linear algebra. In fact, sometimes you may even see transformations or functions referred to as mappings. Again, for our purposes these are all equivalent ways of conveying the same meaning. We take in a set of values, called the domain, and we produce another set of values, called the range.

When discussing transformations of affine and vector spaces, it is quite convenient to represent the transformations in the form of a matrix. This allows for succinct notation, as well as easily combining transformations through matrix concatenation (multiplication). For example, take the following transformation:


Here, we say that the matrix M is multiplied with the vector x, and is represented by the transformation T. We also know, from our discussion about matrices and vectors, that if we are using column vectors and column-major matrices, then the result of this transformation will be another vector. If we’re using row vectors and row-major matrices, then our result would be a matrix. If we want to use row vectors and row matrices, and still get a vector result, we would write the transformation as:


A linear transformation is a transformation that maps from vector space to vector space, maintaining the vector addition and scalar multiplication properties of the vectors. The first part of the definition, mapping from vector space to vector space, means that our transformation matrix must be exactly n x n in size, where n is the dimension of the vector space. The number of columns must be n in order to be compatible for multiplication with the vector. The number of rows must be n to give us an n-dimensional vector as a result. For 2 dimensional space, that means a 2×2 matrix. For three dimensional space, that means a 3×3 matrix.

The second part of the definition means that the addition of two vectors, and scalar multiplication of a vector, must be maintained after the transformation. Put in equation form:



The first equation says that the transformation of the sum of two vectors is equal to the sum of the transformation of the original vectors. This is what we mean when we say that the vector addition property of the vectors is maintained. The second equation says that the transformation of a scaled vector is equal to scaling the transformation of the original vector. This is what’s meant by maintaining the scalar multiplication property of vectors. Because of this, an easy way to determine if a transformation is linear is to combine the above equations and try and prove that they hold true for a particular transformation:


If you can prove that the above does not hold true for a given transformation, then that transformation is not linear.

Next, we’ll look at two of the most common linear transformations encountered in games: scaling and rotation. In addition to describing each transformation, I’ll show that they’re linear by using the equations above, and also use each one with some real numbers plugged in as an example. Hopefully, this will make them easier to understand.


The first linear transformation we’ll look at is scaling. Three dimensional scaling can be represented with the following matrix:


Where si represents the amount of scale along the i vector, sj along the j vector, and sk along the k vector. Solving the linear transformation equation from before gives us:




Which is what we’d expect. Now, let’s show that this is linear by plugging in the two sides of the equation for showing linearity from above and seeing that the results are equal in both cases:








We can see that in either case, we ended up with the same answer. This shows that the scaling transformation is indeed a linear one.

Now, let’s look at an example you may encounter in a game. For this example, let’s say that we are trying to scale up a triangle with vertices (-1, 0, 0), (0, 1, 0), and (1, 0, 0) by a factor of 3 in all directions. I’ve intentionally used simple numbers to make this easier to visualize and show in a blog post. The same concept applies regardless of the numbers. This gives us a matrix with 3 all along the diagonal:


To scale our triangle, we would need to multiply each vertex by the scale matrix:




This gives us the 3 scaled vertices we expect: (-3, 0, 0), (0, 3, 0), and (3, 0, 0).


The other linear transformation we’ll look at is rotation. Rotation consists of an axis of rotation, and the angle of rotation. When looking directly into the axis of rotation (the axis is pointing towards you), the rotation angle is measured counterclockwise. We’ll start by examining the matrix representations of rotations about the axes of the standard basis (i, j, and k). Then, we’ll  show linearity and solve an example.

Let’s start by looking at the matrix for rotation about the i, j, and k vectors (axes). These are sometimes referred to as pitch, yaw, and roll respectively.




We can then combine these through concatenation to form a general rotation matrix representing the combined rotations. However, we’ve seen that concatenation is order dependent, so we must be careful which order we combine them in. Combining pitch, then yaw, then roll may give a different result from combining as yaw, then pitch, then roll. There’s not really any standard for the order in which they’re combined, so be sure to stay consistent.

So, are rotations linear transformations? I’ve said that they are, but let’s show that this is the case. We’ll show this for one of the rotations above, but the same thing can be done for each of them:








As we can see, both ways lead to the same result, which confirms that rotations are in fact linear transformations.

Finally, let’s consider an example with real numbers. Let’s say we have a vector u = (3, 4, 5). We want to rotate this vector by 45o counterclockwise about the i vector, then 30o counterclockwise about the j vector:













And that gives us the new vector after the two rotations: (5.7795, –0.707, 2.3285).

To wrap up, we’ve discussed transformations in general and looked at a special class of transformations known as linear transformations. We’ve looked at the two most common linear transformations in video games, scale and rotation. In the next installment of the math primer series, we’ll discuss affine transformations. As always, please let me know if there’s anything that I can help to explain better.

Posted in Math | Tagged , , , | Leave a comment

Intro to Physics in Games

Migrated from

I’ve decided to try something a bit different and mix in physics posts while I wrap up the math primer. Many of you might be more interested in the physics, or may already be familiar with the math topics, so this gives me a chance to have something for everyone.

Today, we’ll look at some of the core concepts around physics simulations in video games. We’ll go over some terminology and approaches so that we’re all on the same page when discussing more involved topics in later posts. First off, we’ll discuss what it means to have physics in a video game, and how does that compare with physical simulation in other software fields. Then, we’ll discuss the basic steps in simulating physics in a game, and how it integrates with the rest of the game code.


Physics in Video Games

Video games have become incredibly complicated pieces of software over the past 20 years. I recall a time when PC games could fit on a single density 720KB 5.25” floppy disk. Several years later when CDs were initially becoming available, you could find a single CD with over 1000 games on it! These games ran at measly 320×240 resolutions with 8 or 16 colors.

Today, games easily weigh in at several gigabytes, with many being tens of gigabytes. They have nearly photo-realistic graphics with incredibly high resolutions. Along with that extra visual realism comes an expectation of immersion from the user. The point of having incredibly realistic graphics is to make the player feel  like they’re actually in the game. However, for the illusion to hold, the virtual world around them must also behave in a believable manner. The best graphics in the world wouldn’t make a player feel like they’re actually immersed if the characters move unnaturally, pass through walls or floors, or things don’t fall and topple as if they had realistic mass and material properties. These are just some of the challenges of making the world feel believable.

Physical simulation software is nothing new, and has been used for decades for scientific research and study. Even the use of physics in games isn’t entirely new, though it’s gotten a lot more attention in recent years. Even some of the earliest games, such as Pong, had very primitive physics simulations in the sense that the trajectory of the ball bouncing off the paddles was determined in a way that would seem intuitive and consistent with our understanding of physics. What is new, however, is the recent adoption of general purpose physics simulations in video games. The physics in early games was very much written for the specific scenarios of that game. For example, it could handle projecting the trajectory of a ball off of a paddle, but nothing else. This meant that for each new effect the game wanted to have, a new piece of physics code would have to be written to handle it. Additionally, a new game written would have to have new physics code written for it as well. As game worlds became more complicated, the need for more physical scenarios increased, and the cost of writing specialized functions for each scenario became too prohibitive. During this time, the concept of having a general purpose piece of code that, given some overall parameters, could simulate arbitrary physical conditions and scenarios came to be. These general purpose physics engines could allow game developers to create games with many more interesting scenarios, and do it much more quickly.

Before moving on to discussing the anatomy of these physics engines, there’s one more important point to consider. Physics simulations are just that, simulations. Physics, in general, is very complex and not possible to replicate in code. However, if we take certain core aspects or principles of physics, and simulate them with some acceptable level of accuracy, then performing this in code becomes much more reasonable. The most common class of physics simulated in games is called Rigid Body Dynamics. Rigid body simulations use Newtonian principles of movement and mass to simulate rigid objects (spheres, boxes, a tower, a boulder, a wall, etc…). In games, the physical accuracy of the simulation only needs to be high enough to be believable, and is often times reduced to improve performance. A smooth running, albeit slightly less accurate, game is much more immersive than a perfectly accurate game that runs poorly and constantly drops frames.

Basic Components of Physics Engines

There are many different physics engines out on the market today. Each one works somewhat differently than the next, and each uses different approaches to solving the general physics problem for games. However, in order to solve the problem for games, there are certain things that every engine must do. For the purposes of this post, we’ll be focusing on rigid body dynamics as it’s the most common type of physics in games. I’ll try and include posts later to discuss other types of simulations such as cloth, fluid, or gas. All of the components of the physics engine can be summarized in a single general operation called the simulation step.

Video games display many rendered frames per second. Comparing this to animation, many small in-between images are shown quickly to give the impression of movement. The de facto standard for most PC and console games is to target 60fps, which means that each frame represents about 0.01667s of time. For the physics simulation to match, only 0.01667s worth of physics should happen each frame. That means an object moving forward at a velocity of 1m/s should only move 0.01667 meters that frame, and then again the next frame, and then again, and finally after 60 frames have gone by, the object would have covered 1 meter of distance.

These intervals of moving objects along their trajectories is called the physics simulation step. Almost every physics simulation equation involves time, and the time used for solving these is this time slice determined by the game. Most engines will take in the time as a value each frame, as it could change from frame to frame. Some games use variable frame rates, and so a static number should never be assumed.

So, what exactly happens during the simulation step?


A typical physics engine normally tracks all the objects that it’s simulating in a data structure somewhere. For simplicity let’s call it a list, though this isn’t always the case. For each object, the physics engine needs to know some important information, such as it’s mass, current velocity, current position and orientation, and outside forces acting on the object. Each step, the integration portion of the physics code will solve for the new current positions and velocities for every object. This involves using equations of motion from physics to approximate the positions and velocities for an object in some future time (current time plus the time slice for the frame), using the current positions and velocities as starting points. In addition, outside forces such as gravity, springs, friction, or anything else relevant to the game are also considered to ensure the newly computed positions and velocities make sense.

Collision Detection

If there were only a single object moving in a vacuum, then we’d be done! However, most games involve more than one object, and these objects move in interesting environments. Therefore, there arises a situation where two objects are moving towards each other or end up running into each other. What happens? If we don’t do anything, these objects will just pass right through each other. The renderer certainly has no knowledge that these are separate, solid objects that shouldn’t overlap. It will happily draw them intersecting each other. In most games, however, you don’t want the objects to pass through each other. An asteroid colliding with a ship should crash into it, possibly destroying the ship. A character walking on top of a floor should stay above it, gravity shouldn’t pull the character down through the ground. In order to handle these scenarios, the game needs to know thattwo objects are overlapping. The process of identifying these scenarios is called collision detection, and is normally one of the other major tasks the physics code must do. Generally, the job of the collision detection code is to determine all such pairs of overlapping objects, possibly including some additional data such as how far they overlap and in what orientation, and providing this data to the game for further processing.

Collision Resolution

Once the physics engine has identified that a pair (or many pairs) of objects are overlapping, then what do we do? In many cases, this is something specific to the rules of the game. For instance, in a space shooter game, when an asteroid overlaps the player’s ship, the game may decide to stop drawing the player’s ship and instead draw an explosion animation. Following that, the game would probably start the level over and reduce the number of lives of the player. All of these reactions to the collision are driven entirely by the game itself, and not the physics engine. This is because they are very specific to the game in question. However, there are certainly cases where the game doesn’t care to be involved. For instance, in a first person shooter, if the player knocks over a chair in a room, the game doesn’t need to do anything specific for this case. There are no game rules invoked, and therefore the game just wants the motion of this chair and player to continue to be simulated in a realistic fashion. This likely means the chair falls over in response to being bumped by the player, and when it strikes the floor, it probably rolls or tumbles slightly. This class of reaction is another common, and arguably the most complex, job of the physics engine and is generally referred to as collision response or collision resolution.

We’ve covered the basics of all physics engines. Every simulation suitable for games will have these general components, though it’s common to include many more features such as joints, cloth simulations, physics-based animation, and other interesting things. I’ll be going into a lot more details for each component as we continue to discuss physics in games. I’m planning on presenting a sample architecture for a physics engine, and then drilling into each section piece by piece to discuss the design, why it was designed that way, what algorithms are used, what other alternate algorithms are there, and what optimizations where made and why. I’ll be providing sample code snippets and probably entire posts dedicated to certain algorithms and optimizations. Please note that the architecture used will be something of my own invention, and is not that of any other physics engine out there. While there will certainly be similarities due to the fact that we’re solving the same problem, all of the code and design presented will have been created specifically for the purpose of education on this blog.

Posted in Physics & Collision Detection | Tagged , , | Leave a comment

Math Primer Series: Matrices I: Introduction and Basics Operations

Migrated from

Few things are as ubiquitous in game and graphics programming as a matrix. In this installment of the math primer, we take a look at these structures, investigating not only their numerical significance, but also what they represent visually. Next time, we’ll see how to combine them with vectors and with other matrices to form complex transformations, which we rely heavily on in game code.

Matrices are the Basis of Everything

When we refer to a coordinate system, we represent the system with a central point (origin), and 2 (for 2D) or 3 (for 3D) linearly independent vectors, also called axes. Look back to the earlier primer articles for a refresher if you need it. To measure a point with respect to this coordinate system, we use an ordered 2- or 3-tuple, such as (3, 5, 1), which represents starting at the origin, then moving 3 units along the first vector, 5 units along the second vector’s direction, and finally 1 unit along the 3rd vector’s direction. The three vectors used in this way are called a basis. A basis is the set of linearly independent vectors used to define a coordinate system (or more accurately, a coordinate frame or frame of reference). If we label these vectors u, v, and w, then we can write the 3-tuple above as 3u + 5v + 1w. We can even define left or right handedness of the basis by looking at the way we orient the third vector with respect to the first two. Specifically, third here means the one that we write last when expressing the tuple. When someone mentions the standard basis, they’re referring to the set of vectors (normally labeled i, j, and k), where the values are (1, 0, 0), (0, 1, 0), and (0, 0, 1) respectively, normally centered around the origin O (0, 0, 0). Let’s take a look at Figure 1 for a more visual representation.


Something important to note here is that we can express any basis in terms of another basis. That means that we can express any basis in terms of the standard basis. If we look at Figure 1 a little more closely, we see that even though u, v, and w are vectors making up a basis, they’re still just vectors, and that means we can express them in terms of i, j, and k. For instance, if we use the basis consisting of u, v, and w as our frame of reference, then u would most likely be written as (1, 0, 0). However, using the standard basis as our point of reference, we might measure u to be some vector (0.6, 0.6, 0) or something of the like. This is a key concept to keep in mind: basis vectors can all be measured and represented in terms of another basis, including the standard one.

It’s getting a little tedious to constantly keep referring to a basis in a manner such as “the basis consisting of basis vectors u, v, and w”. Wouldn’t it be nice if there was a succinct notation to capture that? Well, there is! We can write the basis vectors in block form, like:


Where u, v, and w are expressed as vectors in terms of the standard basis. This is a matrix, and the expanded form looks like this:


Now, we need to clarify a few things here before continuing. The way I’ve laid out the matrix above is not the only correct way to lay out a basis. Just as coordinate systems have left handed and right handed variations, which are just a matter of preference and convention, matrices too have some convention considerations to make. Matrices may be used in what’s called row-major or column-major form, which directly correlates with how you wish to write vectors. Vectors can be written as row-vectors or column-vectors. See Figure 2 for a comparison of the two. It’s important to realize that just as picking one handedness of coordinate system over another had no bearing on the result, just the shape of the formulae, the same is true of how we express our vectors and matrices.

clip_image002[10]                                 clip_image004

Figure 2: The row-vector on the left is written with the 3 components horizontal. The column vector on the right is written with the 3 components laid out vertically.

How we write out matrices is directly related to what kind of vectors we use. If we use row vectors, then we lay down our basis vectors as rows of the matrix. If we are using column vectors, then we lay down the basis vectors as columns in the matrix. See Figure 3 below:

clip_image002[6]              clip_image004[1]

Figure 3: The row-major matrix on the left consists of the basis vectors written as rows of the matrix. The column-major matrix on the right consists of the basis vectors written as columns in the matrix.

Again, which we choose doesn’t make a difference as long as we are consistent with our convention and form our formulae appropriately for the form we’ve chosen. In mathematics, and most texts on mathematics and graphics, column form is more widespread. However, in the actual game and graphics world things are a little more divided. For instance, DirectX and XNA both use row form, but OpenGL uses column form. You could write an entire library in column form, but still use DirectX or XNA to render, you just need to convert between the forms at the appropriate places. The conversion between the two, called a transpose, will be covered a bit later in this post.

I will choose to use column vector form even though most of my coding examples will likely be either DirectX or XNA. You might think that seems counterintuitive, but I believe writing out mathematics in a manner which is more consistent with academia is more natural, and will match what you find in research papers, math books, and other references around the web. My coding examples are just that: examples. I don’t think it’s justified to form my mathematical explanations around my examples’ choice of graphics API.

NOTE: It is important to realize that this discussion of row versus column major matrix is only relevant when talking about vectors. As we’ll see in a moment, matrices can be used for many other things besides vectors, and in those cases there is only a single form of a matrix. In other words: row major and column major matrices are still the same matrix, we’ve just chosen to impose a convention on how we write vectors in terms of matrices.

Introduction to Matrices

While it’s convenient to write vectors as single row or column matrices, and basis vectors as the rows or columns of a 9 element square matrix, these are far from the only uses of a matrix. In fact, most formal definitions of a matrix are something along the lines of “a rectangular arrangement of numbers, organized into rows and columns”, which says nothing about vectors.

Let’s try and define matrices a little more generally now. We know that they look like blocks of numbers, and that they are considered to have rows and columns, so let’s add that we can refer to the number of rows and columns as “n by m”, or n x m, where n is the number of rows and m is the number of columns. This is summarized in Figure 4.

 clip_image002[8]        clip_image004[3]        clip_image006[4]        clip_image008[4]

Figure 4: Matrices are labeled using row x column. From left to right, the matrix sizes are: 2×3, 1×2, 3×1, and 3×3.

We can see from the first matrix in the figure that elements within the matrix are always labeled using the row, then the column number, and are 1-based. The subscripts in the first matrix show the row and column numbers of element aij. The second and third matrices could be interpreted as row and column vectors, respectively. The final matrix in the figure leads us to a few more definitions.

If the number of rows and columns in a matrix are the same, the matrix is said to be a square matrix. Since the number of rows and columns are the same, we can refer to square matrices with a single size dimension. For instance, we can say something is a dimension 3 square matrix, meaning it’s a 3×3. The set of all elements in the vector for which the row and column are the same (a11, a22, etc…) is called the diagonal of the matrix. For example, in the rightmost matrix above, the diagonal is the set of numbers (4, 3, 8). If the only non-0 elements in a matrix are within the diagonal, then the matrix is called a diagonal matrix. So, to summarize, our final matrix above can be called a square, diagonal matrix. Finally, as a matter of notation, I’ll use capital, bold, italic letters to represent a matrix. This should make it easier to tell them apart from vectors (bold lowercase) and points (italic capital). For example, a matrix M.

There is a special matrix, called the identity matrix, which is a square, diagonal matrix with only 1s in the diagonal. It can be any size, is normally written as I or In (where n is the size), and we’ll see in a moment that it’s used to ensure multiplicative identity holds (hence, the name).

Now that we have notation and terminology out of the way, let’s start looking at operations we can do with matrices!

Basic Matrix Operations


The most trivial operation we can perform on a matrix is taking it’s transpose. This swaps all the rows of the matrix with all of the columns. If the matrix begins as an n x m matrix, then the transpose is an m x n. The transpose of M is written as MT. Examples:




There are a few important observations to make. Firstly, if we look back to our discussion on row and column vectors and matrices, then we can see clearly now that to convert between the two we take the transpose. Secondly, it’s important to note that the diagonal of a matrix remains the same after taking it’s transpose. This will always be the case. In fact, many like to think of taking the transpose as taking the reflection of the non-diagonal elements across the diagonal.

Addition and Subtraction

Addition and subtraction of two matrices doesn’t come up quite as often in games, but no discussion of matrix operations would be complete without them. Matrix addition can only be done between two matrices of the same exact dimensions, and is a trivial operation of summing the elements at the same location in each:



Subtraction is done in the exact same way, again requiring the matrices be of the same dimension and shape.

Scalar Multiplication

Scalar multiplication is as straightforward as can be. Multiplying a matrix M by a scalar s just multiplies each element of the matrix by s.

Matrix Multiplication

Multiplying, also called concatenating, two matrices is by far the most common operation done on matrices in game code. We’ll explore why when we talk about transformations later in this post. Multiplying matrices isn’t as straightforward as adding or subtracting, but with a little help visualizing what it is we’re doing, it’s not too bad. Let’s take the following two matrices, A and B.



Multiplication of matrices requires that the number of columns of the first matrix match the number of rows of the second. We can refer to this dimension as d. We can see that our matrices A and B meet that requirement, with d = 3. The resulting matrix from multiplication will have dimensions of n x m, where n is the number of rows in A and m is the number of columns in B. The formal definition of matrix multiplication then is:


Using our example matrices A and B from above:




By this definition, matrix multiplication is not commutative, since the column and row requirements may not be met, and even if they were, the result would be different. The only matrices which meet the row and column requirements when reversed are square matrices, but again the result of the multiplication is different.


While we can certainly think of multiplication in this way, I find it far easier to visualize the multiplication in a more vector-oriented way. If we imagine the rows of the first matrix as vectors (like a row major basis), and the columns of the second matrix as vectors (like a column major basis), then what we’re really doing is taking the dot product of each possible pair of vectors, with the dot product of the ith row of the first matrix and the jth column of the second matrix making up the element mij in the resulting matrix. In other words:



We now can see what the purpose of the identity matrix, I, is. Multiplying any matrix by an identity matrix of the appropriate size, yields the original matrix:


Multiplying a Matrix and a Vector

Multiplying a vector by a matrix is called transforming the vector. We’ll look at transformations in the next post in this series, but let’s understand the math behind it first. This is where our discussion of column versus row vectors becomes most relevant. To multiply a vector and a matrix, we treat the vector as a single row or column matrix and multiply as usual. This implies that which side of the matrix the vector goes on is important, and must satisfy the row and column requirements of the multiplication. If our vector is a row vector, then we could write a 3-vector and 3×3 matrix multiplication as:


We could similarly write a column vector multiplied by the same matrix as:


Notice that the vector is on the other side of the matrix. This is required to make the multiplication work. Looking at the expressions above, the product of the multiplication would be different in each case. However, we know intuitively that transforming a vector by a matrix can only have a single answer, and that our choice to use row or column vectors shouldn’t impact the result. In order to move the vector from one side of the matrix to the other to satisfy the multiplication requirements, we must also transpose our matrix to ensure that the result remains the same. If we do that, the product of the multiplication will be the same regardless of whether we choose row vectors or column vectors. This is exactly what we were talking about up above in the basis section. So the correct form for the second equation becomes:


Which will ensure we get the same result as the first case.

There are many more operations we can do with matrices, and I’ll cover some as we go through the next few blog posts. But for now, we have enough covered that we can start to look at linear transformations. The next installment will start our exploration of transformations, beginning with linear transformations. I hope you enjoyed the introduction of matrices, and as always let me know if there’s anything you’d like to see me explain in more detail.

Posted in Math | Tagged , , | Leave a comment

Barycentric Coordinates and Point in Triangle Tests

Migrated from

I know it’s been a little while since my last post, and I apologize. I’ll try and keep the posts a little more frequent moving forward.

In the last post, we briefly encountered barycentric coordinates and loosely defined them as the coefficients of an affine combination. While that’s true, we can do better. We can define a more precise definition, and we can take a closer look at what they really mean, both numerically and geometrically. That’s the topic of this post, as well as taking a brief look at how we could use them in a real world game programming context.

First, let’s take a look at the formal definition of the coordinates, then we’ll consider a slightly refactored version that works better for our situations. Consider a triangle ABC (see Figure 1), and then imagine a little weight at each vertex. We can assign each weight as having 100% weight contribution from that vertex, and 0 contribution from the other vertices. So, for point A that’s (1, 0, 0), for point B it’s (0, 1, 0), and for point C it’s (0, 0, 1). This means that if you’re at the vertex, you only get the weight from that 1 vertex. Anywhere else within the triangle would be a combination of these weights. The barycenter of the triangle is the point inside the triangle where you could balance the weights. In other words, the weights would be contributing evenly to that point. Assuming that the weights at each vertex are equal (1), this would be the mean of the weights (or vertices):


The barycentric coordinates for the barycenter then become (1/3, 1/3, 1/3) given our equation above. Notice that this is exactly the equation of an affine combination, which is why we stated that the coefficients of an affine combination are also the barycentric coordinates. The dashed lines in the image represent the barycentric axes. A barycentric axis starts on a triangle edge, where the weight for the opposite vertex is 0. They then extend through the barycenter of the triangle to the opposite vertex, where the weight for that vertex is 1. Notice that the other coordinates in the base of each axis are 1/2. This is just a coincidence since we happen to be looking at an equilateral triangle. This won’t hold true for other kinds of triangles.

Now, what else can we observe? Well, for each axis, our values only extend from 0 to 1, since they are weights of that vertex. In other words, a value less than 0 or greater than 1 would be outside of our simplex (triangle in this case). Furthermore, the sum of the coordinates is necessarily equal to 1. Since these coordinates represent the amounts of each weight that you observe at that point, they are percentages, and therefore must sum up to the total. Otherwise, you’d be missing some of the weight from the system. These two observations will be extremely helpful as we examine uses in game code. The other thing I wanted to mention here is that if the restriction that coefficients of an affine combination must sum up to 1 didn’t make sense after my explanation before, I hope that looking at it from the perspective of barycentric coordinates helps to justify the restriction.


Now that we’ve examined the formal definition and use of barycentric coordinates, let’s take a slightly modified look at them, and see how we can make them more useful to us as game developers. We saw in the last post that any affine combination could be refactored and expressed as a single point, or origin, added to a linear combination. Let’s take our barycentric equation (which is an affine combination) and refactor it in this way now, using s, r, and t as our barycentric coordinates (coefficients):











We’ve substituted u and v as (B-A) and (C-A), respectively. Relating our result to the figure above, we see that we’ve picked A as our local origin, and two of the triangle’s edges as u and v. Our barycentric coordinates have been reduced to just r and t, and are now expressed relative to the local origin A. It’s important to note that mathematically, this ‘reduced’ form of the barycentric coordinates is equivalent to the formal version, but this format is much more usable to us. Because r and t are still barycentric coordinates, they must still fall between 0 and 1, and their sum still cannot exceed 1. Notice that we said exceed this time, and not sum up to 1. This is a subtle difference than before, and it can be explained like this: previously, we had the sum of s, r, and t = 1. This still must hold true. However, we’ve made s implicit in our new reduced form, and therefore it is not directly included in our sum. If r and t sum up to 1, then s is 0. However, r and t can sum to less than 1, and s will be equal to the remainder (see the second and third steps of how we arrived to our reduced form). In summary:



Now let’s see how we can use this. Imagine we’re trying to write a function in our game that determines whether or not a point is contained within a triangle. This is actually quite the common problem to solve in collision detection. In two dimensions, it might be a direct collision detection query. In three dimensions, we normally will first determine if a point is in the plane of a triangle, and if so, then reduce it to a two dimensional problem exactly like the normal 2D case. In either situation, we need a robust way to determine if a point is contained within a triangle. We can use barycentric coordinates in the reduced form to solve this, by taking the following steps:

1. Pick a vertex of the triangle to be our local origin, which we’ll refer to as A

2. Compute u and v using the differences of the other two vertices and our origin as we did above (ex: u = B-A, v = C-A)

3. Compute the r and t barycentric values for our point P with respect to A

4. Check that r and t are both within 0 and 1, and that their sum is less than or equal to 1. If so, return true. Otherwise, the point is outside, return false.

It’s actually quite straightforward, with the exception that we haven’t yet discussed how you’d complete step 3, computing the barycentric coordinates. Let’s take a look at that now, and then with r and t computed, we’ll be able to complete the function. There are several approaches to finding the barycentric coordinates in this step. The simpler of the two uses some identities and properties of vectors we’ve covered up to this point, which is why I’ll choose to show that one now. However, after we take a good look at matrices and solving systems of linear equations with them, we’ll see that there’s a more efficient way to compute them. See Figure 2 to see the problem we’re trying to solve.


Let’s start by refactoring our equation slightly, and move the A over to the other side:




We’ve replaced P-A with the vector w. We need to now solve for r and t. If we look at some of the vector rules that we covered earlier, we’ll remember that any vector crossed with itself (or any collinear vector) will result in the 0 vector. So, to eliminate t, let’s take the cross product of both sides with v:






We can see that by having the cross product of v with itself go to the 0 vector, we were able to eliminate t from the equation and solve for r. We can repeat this same process for t to obtain:


At this point, we can determine whether or not r and t are going to be greater than 0 or not. This is the first requirement of our test. The equations for r and t above represent ratios between two cross products. The two cross products each represent a vector, so in other words we’re taking the ratio of two vectors. The only way r or t can be negative is if these vectors point in opposite directions. So, let’s use the dot product to determine if the numerator and denominator in each case point in the same direction:



The sign() of each side will be > 0 if the vectors are pointing the same direction, or < 0 if the vectors are pointing away from each other. At this point, if either is < 0, we can exit the function with a false result.

The next requirement we need to meet is that r and t must each be smaller than 1, and that their sum must also be less than 1. To do this, let’s take the norm of each of the equations above. Since we already know that r and t are positive, we know that the norm of r and t is just r and t.




Again, we can repeat this same process to solve for t, and we’ll get:


Also, since swapping the order of a cross product doesn’t change the magnitude of the resulting vector, only it’s direction, then we can also say that


Which gives us a common denominator in our r and t formulas, so we only have to compute the value once. And there we have it, we now have r and t computed, which means we can complete our function. A sample implementation, written in C# and using XNA might look like this:


/// Determine whether a point P is inside the triangle ABC. Note, this function

/// assumes that P is coplanar with the triangle.


///<returns>True if the point is inside, false if it is not.</returns>

public static bool PointInTriangle(ref Vector3 A, ref Vector3 B, ref Vector3 C, ref Vector3 P)


    // Prepare our barycentric variables

    Vector3 u = B – A;

    Vector3 v = C – A;

    Vector3 w = P – A;

    Vector3 vCrossW = Vector3.Cross(v, w);

    Vector3 vCrossU = Vector3.Cross(v, u);

    // Test sign of r

    if (Vector3.Dot(vCrossW, vCrossU) < 0)

        return false;


    Vector3 uCrossW = Vector3.Cross(u, w);

    Vector3 uCrossV = Vector3.Cross(u, v);


    // Test sign of t

    if (Vector3.Dot(uCrossW, uCrossV) < 0)

        return false;


    // At this point, we know that r and t and both > 0.

    // Therefore, as long as their sum is <= 1, each must be less <= 1

    float denom = uCrossV.Length();

    float r = vCrossW.Length() / denom;

    float t = uCrossW.Length() / denom;


    return (r + t <= 1);


And that concludes this post on barycentric coordinates, and one of their uses in game code. I hope that this post also served to solidify some of the previous information about affine combinations. Finally, it gave us a chance to use what we’ve covered so far to so something useful. I hope you enjoyed it, and I’ll be moving on to matrices next.

Posted in Math | Tagged , , , | Leave a comment

Math Primer Series: Vectors III: Affine Spaces, Linear and Affine Combinations

Migrated from

In this chapter of our primer, we’ll examine affine spaces, and see what affine and linear combinations are. Furthermore, we can use these concepts to define some other related concepts, such as affine and linear dependency.

Affine Space

An affine space can be thought of as any space containing points and vectors together, which follow the rules we established in the previous posts: any point plus a vector gives a point, and the difference of two points is a vector. The key is that affine spaces give us a way to correlate points to vectors, and vectors to points. Because an affine space is a single space that defines both points and vectors, we’re able to do things like add a vector to a point. However, we must follow the rules we discussed before. When defining the operations before, we actually assumed an affine space without explicitly calling it out.

What other properties do affine spaces have? Why do we care about them? Well, besides letting us define the basics of vector and point operations, they provide two other concepts which we’ll draw from. These are linear and affine combinations.

Linear Combinations

Using only vector addition and scalar multiplication, we can define a very powerful concept for vectors. Let’s examine Figure 1. On the left half, we see 2 vectors of varying lengths, but the same direction. On the right half, we see 2 vectors of varying lengths and different directions.


The important observation here is that for the pair of vectors on the left, u can be represented in terms of v and vice versa. For instance, if we know that u is twice as long as v, we can write u = 2v. This is what’s called a linear combination. If any vector can be represented in terms of adding scaled forms of other vectors, it is a linear combination of those vectors. The set of vectors are referred to as linearly dependent. This can be put into equation form:


Now, let’s examine the pair of vectors on the right. Here, we see that there’s no way we can express the vector u in terms of v. When a vector cannot be constructed as a linear combination of another vector (or set of vectors), they are called linearly independent.


Affine Combinations

What if we were to try and do the same thing with Points? We can certainly try and write the equation:


This is what we call an Affine combination, but how can that be? We know we can’t add points together. Well, it turns out if we impose the restriction that the sum of the coefficients equals 1, we can actually make this into a linear combination added to a point. Let’s follow through the math to see how this works:





Note that the differences of points we’re using on P1 – P(n-1) are really vectors, so this has become a linear combination (no restriction on coefficients) added to an origin point (P0). While this is useful to think about mathematically, I think looking at it visually helps to drive the concept home. See Figure 2.


Here, we have 3 points: A, B, and C. We wish to find a way to express the point D in terms of the other points (which would be the equivalent of expressing a vector in terms of other vectors). If we can define a vector from A to B (call it u), and another from A to C (call it v), then we can express the vector from A to D as a linear combination using the definition from above. In this case, it would become xu + yv. However, we are interested in finding the point D, not the vector from A to D. If we replace the vector with (D – A), and then add A to both sides of the equation we get our point D and we have an equation to express it in terms of the other points (the affine combination we were looking for). Let’s take a look at the equations:








We can generalize this equation to n points with the following form of the equation:


It is important to note that while the form above is convenient for thinking of the point in terms of an origin point and linear components, the proper form for an affine combination is the first one that we discussed:


For the curious, the coefficients in the affine combination equation are also called barycentric coordinates. If the vectors formed from the points are linearly independent, then the points P0 – Pn-1 are called affinely independent. We refer to an n-order set of affinely independent points as a simplex (See Figure 3).


Well, that’s it for this installment! We’ve discussed affine spaces, linear and affine combinations, and how we can use them to define linear and affine dependence. These concepts are a bit less obvious than the previous chapters, so please let me know if there was anything confusing in here I can help clear up or explain further.

Posted in Math | Tagged , , , | Leave a comment

Math Primer Series: Vectors II: Vector Operations

Migrated from

In the last installment of this math primer, we looked at the difference between points and vectors. Today, we’ll dive a little bit deeper into vectors, and the specific operations we can perform on them. More importantly, we’ll take a look at the geometric significance of the many operations, and how they can help us in building a game.

We’ll focus primarily on unary and binary operations of vectors. There are a few interesting ternary operations as well, but I’ll leave those for interested readers to follow up on independently. They’re not nearly as prevalent in game code. As always, see the references section of the initial Math Primer post for further reading.

Addition and Subtraction

The most basic binary operations we can do on vectors are addition and subtraction. Adding two vectors is as simple as it sounds; we create a new vector whose components are the sums of the corresponding components from the initial two vectors:


Subtraction is equally simple, we just subtract the components of the second vector from the first. However, there’s an assumption we’re making in order to subtract. That is, we’re assuming that there is such a thing as inverting a vector to a vector with equal but negated components, since subtraction is really just addition with the inverse (negation) of the second vector. Well, it turns out that our assumption is actually correct. It is possible to invert a vector, which multiplies each component by –1. This inverse vector has the same magnitude as the original vector, but opposite direction.


Now, why would we ever want to add or subtract vectors? Well, the answer lies in the geometric significance of these operations. Adding two vectors gives a single vector that has the same direction and magnitude as following each of the addend vectors back to back. Let’s look at a diagram:


In this picture, the two black vectors represent the vectors being added together, and the dashed blue vector is where you end up after following the two vectors in order. This is exactly equal to the sum vector obtained by adding the two vectors component-wise as we did above. Again, subtraction is similar, just inverting one of the vectors (reversing it’s direction, but otherwise leaving it oriented the same way and the same length). See Figure 2.



Before we dive into multiplications between two vectors, it’s worth taking a quick look at a simple scaling operation. We can multiply (or scale) a vector by any scalar value by simply multiplying each component of the vector by that value:


The geometric significance of this is that it scales the length of the vector by s, leaving the direction as it is.


Dot Product

The first type of multiplication between vectors we’ll look at is called the scalar product, also called the dot product or inner product. As the name scalar product implies, the result of this type of multiplication is actually a scalar value, and not a vector. To compute the scalar product of two vectors, we sum the component-wise products of each vector:


Notice the dot-like symbol for the scalar product. This is why it’s most commonly called the dot product. The dot product is perhaps the most commonly encountered vector operation in game and graphics code. There are several reasons for this, not least of which is the geometric significance of the dot product. The dot product of two vectors gives the scaled length of the projection of one vector in the direction of another. What does that mean exactly? Said another way, it tells you how much of a vector is in the same direction as another vector:


On the left half of this figure, we see two vectors. We wish to determine how much of the first vector lies in the direction of the second vector. This is exactly what the dot product gives us. On the right half of the figure we see the two vectors again, this time with their tails placed together. We can visibly see that the portion of the vector in the direction of the other is equal to the blue arrow. The dashed red line shows that this forms a right triangle. The dot product of the two original vectors is equal to the length of the blue vector (the projection) scaled to the length of the second vector (the one the blue vector is overlapping). If this vector is of unit length, then the dot product is exactly the length of the blue vector, or projection of the first vector onto the second.

Another major reason the dot product is popular in game and graphics code is that the dot product of two vectors is equal to the cosine of the angle between them, scaled by the lengths of both vectors:



On a final note, before moving on to other forms of vector multiplication, it’s worth noting that scalar products are commutative:



Cross Product

The vector product, also called outer product or cross product, is the next and final form of multiplication we’ll look at in depth. There are other types of vector multiplication, such as the Hadamard product and tensor product, but they either don’t appear often in game programming, or are the topic of later discussions (we’ll revisit tensor product when talking about matrices). As the name vector product implies, this product results in a vector and not a scalar:


The equation, as well as how we arrive there, will become more clear when we look at matrix operations in the later chapters of the primer. Now, why is the cross product important? Before we answer that, let’s take a very quick detour and discuss coordinate systems and coordinate handedness.

A coordinate system is a representation that we impose onto a space so that we can describe it. What coordinate system we use to describe something has absolutely no bearing on the object itself, only on the description we use. Let’s look at a comparison of two arbitrary coordinate systems and the same object represented in both:image

Here we have the same cube in the same space, with two arbitrarily selected coordinate systems drawn. We can see that the cube is the same regardless of how we’ve selected our coordinate space, however describing that cube will differ. Using the first coordinate system, I might say “the cube is out a few units along the x-axis, and a few units forward along the z-axis”. Using the second coordinate system, I might say “the cube is out a few units along the y-axis, and a few units back along the x-axis”. Clearly, these descriptions are different, though we’re talking about the same cube.

What we’ve shown here is that it really doesn’t matter what coordinate system you use. There is no single correct, or universal, coordinate system. Different games and graphics packages use different systems. What’s important is that everything which uses the coordinate system needs to agree on a single system and remain consistent. Otherwise, a seemingly correct description from the perspective of one part of the code might be complete nonsense to another part. For this blog, I’ll try not to assume any particular system and generically refer to the 3 axes as ‘right’, ‘up’, and ‘forward’ where I can. You can label these anything you like.

Now, we mentioned handedness of a coordinate system. What we mean by that is the orientation of the 3 axes that make up our 3D coordinate system. There are two possible orientations, shown in the two images above. As we can see, the axis pointing up and the axis pointing to the right are the same between the two coordinate systems, but the third axis points towards us in one case and away from us in the other. While you can make the first two axes point in any direction, what you’ll find when laying out coordinate systems is that they will always end up in one of two formations. These are called left-handed and right-handed systems. The easiest way to tell which is which is by taking your right hand and pointing your fingers all along the ‘right’ axis, then bend your fingers in the direction of the ‘up’ axis. If you’re thumb is pointing in the direction of the forward axis then you’ve got a right-handed system. Otherwise, it’s a left-handed one. In the figure above, the first one is right-handed, and the second one is left-handed.

That’s enough of a detour, let’s get back to the cross product! The most important geometric significance of the cross product (for games) is that the resulting vector will be pointing in a direction that is orthogonal to the two vectors we multiplied. However, you’ll notice that there are two possible results here. They are equal but pointing in opposite directions. See the two red vectors in the figure below:


Which of the two cross products you get depends entirely on the handedness of the coordinate system you’re using to represent your vectors. This works for any coordinate system, but as an example let’s take the first coordinate system in Figure 4. We could express our vectors in terms of (x, y, z). If we were to use the same coordinate system but make it left-handed (invert z in this case), then our vectors would be expressed in terms of (x, y, –z). From the cross product equation above, you can see that this would negate some of the terms, which when multiplied out can be shown to return the inverse of the cross product computed with (x, y, z). To look at it another way, you can use your hand again (use the hand that matches the handedness of the coordinate system) and point your fingers in the direction of the first vector being multiplied. Then curve your fingers in the direction of the second vector. The cross product will point in the direction of your thumb.

The other geometric property, though used less often, is that the length of the cross product is equal to the sine of the angle between the vectors scaled by the lengths of the two vectors. This is exactly the same as with dot product, except sine instead of cosine:


Finally, we should note that cross products are not commutative.

Magnitude and Normalization

The final two operations we’ll take a look at are unary operations on a single vector. The first is determining the magnitude of a vector, which is computed using the following equation:


The final one is called normalizing a vector, which is to scale the vector to a length of 1, while maintaining it’s direction. This is done by simply dividing the vector by it’s magnitude.


The ^ symbol over a vector is used to denote that the vector is normalized, or of unit length.

That’s all for this chapter of the math primer. Hopefully that helped clear up and explain some of the elementary operations we’ll be needing throughout our work on physics and game code in coming posts. Please feel free to let me know if I’ve made any mistakes or if I could explain something further or better.

Posted in Math | Tagged , , | Leave a comment