## Building a 2D physics engine, Part 2 – shapes and collisions

Last time we discussed the basics of linear motion. In this post, we’ll talk about the basics of assigning shapes and detecting collisions between these shapes. But before we do that, a quick aside:

It took me longer to get to this post than I had intended, and unfortunately I worry this will be the trend due to the various demands on my schedule. Therefore, reluctantly, I’m going to keep the posts more brief and in overview form so that I can produce them quicker to get the main points across. If you want a more thorough explanation of some specific portion, please let me know in the comments and I can spend time on just those parts to flesh them out more. This way, I can still get content out there quicker, but also address technical depth on-demand where it’s actually needed.

With that out of the way, let’s get back to shapes. We want to assign each object a shape, and these shapes will assume the world transform of the parent object they are a part of. For instance, if I define a circle around an object as it’s shape then move the object, I expect the circle to follow that movement as well, so that it’s consistently enclosing the object I assigned it to. Instead of duplicating the transform info to the shape, we just pass it along with the shape when desired. This keeps the shape type as simple as possible, just representing the shape in the local space of the object.

We can have different kinds of shapes, so each shape gets a Type member, letting us know what type of shape it is. We’ll start off with 2 types of shapes: circles and boxes. A circle is an appropriate shape to use for something that’s roughly the same dimensions in both horizontally and vertically, and also doesn’t change shape much under rotation. A ball is an obvious example. Boxes can be used to cover any rectangular shapes which are longer in one direction than the other, or that need to keep rigid shape as they rotate. There are many more shapes we can define to give better representations of objects, but we’ll start with these two simple ones for now. Later, when we move to a different kind of collision detection scheme (GJK), it will become trivial to add new shapes.

A circle consists of a center point and radius. If we assume that it is always located at the position of the parent object, then the radius is all we need. There are certainly interesting scenarios where the position of the object may not be it’s center of mass, and then you would want to specify the center point of the circle as a relative position offset from the parent object, but we’ll simplify in our engine to say the two must be the same.

There are several ways we can specify a box shape. We can store a min and max point, or we can store a center and two extents. We’ll opt for the latter in our code, and use the same simplification as the circle in that the center is always the position of the parent object.

Okay, so now we have some shapes, and we’ve assigned them to the objects. Now what? As the objects move around, we’ll want to detect collisions between the various shapes and determine how we should resolve those collisions. For this, we need to define something called contact info. For any given contact between two objects, at a minimum, we’ll need a representative contact point in world space (or in the relative space of the two objects, or both), a contact normal vector representing the normal on the surface of one object where the other collided into it, and finally a penetration depth of how far the two shapes are overlapping along that normal direction. There are many other parameters we could find for more advanced resolution of the contacts, but for now we’ll stick with these.

To detect a collision and determine the contact info, we define a function called Collide, which takes in 2 RigidBody objects and a ContactInfo object for output. The function returns a Boolean to let us know if there was a collision or not, and if there was, the ContactInfo parameter we passed in is filled with the result. The reason the function takes in the RigidBody type and not the shapes directly is that the RigidBody has a pointer to the shape, but also contains the position and orientation data for the object, which we’ll need to know where in the world (and at what orientation) the shape is. This function acts as a dispatcher for the actual collision detection, since we’ll currently be providing a different function to detect collisions between each combination of shape types. So, we have a Circle/Circle function, a Box/Box function, and a Circle/Box function currently.

For each pair of potentially interacting objects, we keep a data structure called the RigidBodyPair. The purpose of this structure is to hold data about the interaction of 2 bodies, including the collision data if it exists. Ideally, we’d only produce one of these pair objects when 2 bodies are in close proximity to each other (since otherwise, there’s no way they could interact), but for simplicity the current implementation just creates one for all possible pair of objects, an N^2 setup.

Let’s look at each of the collision functions in the code now, and go over the premise of each of the collision functions and how they work.

We’ll start with the simplest, the Circle/Circle test. If you imagine drawing a line from the center of one circle to the center of another, that is the closest possible distance between the two, and is the line along which a contact of the two circles must occur on if there is one. Since the distance to the surface of the circle along any line (including this one) is equal to it’s radius in that direction, that means that if the two circles were just touching along this line, then the total distance between their centers would be equal to the sum of their radii. If the two objects were intersecting, then the sum of their radii would be greater than the distance between their centers, and if they are not touching along that line then the sum of the radii would be less than the distance. Given that info, let’s take a look at the function:

bool CollideCircleCircle(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
const CircleShape* shape1 = (const CircleShape*)body1->GetShape();
const CircleShape* shape2 = (const CircleShape*)body2->GetShape();

float r2 = r * r;

Vector2 toBody1 = body1->Position() - body2->Position();
float d2 = toBody1.LengthSq();

if (d2 < r2)
{
contact.distance = sqrtf(d2) - r;
contact.normal = toBody1.Normalized();
contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
return true;
}

return false;
}

Next, let’s consider the Box/Circle case. The idea here is to simply and do the check in the local space of the body containing the box. This makes a lot of the math simpler. To do that, we just take the box’s world transform (which describes going from it’s local space to world) and apply it’s inverse to the circle’s world space position & orientation to transform it to the local space of the box. We exploit a few things to simplify this: 1) a circle has no meaningful orientation, so we only need to transform it’s position, 2) since we know our transform is a rotation followed by a translation, we can just do the inverses of each of these in the reverse order (instead of computing a full inverse matrix), and 3) since the rotation matrix for the box is a pure rotation, it’s inverse is equal to it’s transpose, so we can again avoid a full inverse. Putting it all together, we first subtract the box’s position from the circle’s (inverse of the translation), then we multiply by the transpose of the box’s rotation (which is the inverse). This combination has us transformed back into the local space of the box, and we can proceed with the collision check.

So, how do we actually check for the collision? There are two cases to consider: The center of the circle is outside of the box, or the center of the circle is inside the box. If it’s outside, we find the closest point on the box’s surface to the center of the circle. If the distance between that point and the center of the circle is less than or equal to the radius of the circle, it’s a collision and that’s our contact point (and the side of the box that the point is on provides the normal). If the center of the circle is inside the box (obviously we know it’s a collision then for sure), then we need to find what side of the box it’s closest to, and try to push out that way using that side’s normal as the contact normal.

bool CollideCircleBox(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
const CircleShape* shape1 = (const CircleShape*)body1->GetShape();
const BoxShape* shape2 = (const BoxShape*)body2->GetShape();

// Transform the circle to local space of the box (box then becomes aabb)
Matrix2 rotB = Matrix2(body2->Rotation());
Matrix2 invRotB = rotB.Transposed();

Vector2 toCircle = body1->Position() - body2->Position();
Vector2 localToCircle = invRotB * toCircle;

Vector2 halfWidths = 0.5f * shape2->Size();

// If the center of the sphere is outside of the aabb
if (localToCircle.x < -halfWidths.x || localToCircle.x > halfWidths.x ||
localToCircle.y < -halfWidths.y || localToCircle.y > halfWidths.y)
{
// Find closest point on box to the center of the circle.
Vector2 closestPt = Vector2(
localToCircle.x >= 0 ? min(halfWidths.x, localToCircle.x) : max(-halfWidths.x, localToCircle.x),
localToCircle.y >= 0 ? min(halfWidths.y, localToCircle.y) : max(-halfWidths.y, localToCircle.y));

Vector2 toClosest = closestPt - localToCircle;
float d2 = toClosest.LengthSq();
if (d2 > r2)
{
return false;
}

contact.normal = -(rotB * toClosest).Normalized();
contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
return true;
}
else
{
// Otherwise, find side we're closest to & use that
float dists[] =
{
localToCircle.x - (-halfWidths.x),
halfWidths.x - localToCircle.x,
localToCircle.y - (-halfWidths.y),
halfWidths.y - localToCircle.y
};

int iMin = 0;
for (int i = 1; i < _countof(dists); ++i)
{
if (dists[i] < dists[iMin])
{
iMin = i;
}
}

switch (iMin)
{
case 0: contact.normal = Vector2(-1, 0); break;
case 1: contact.normal = Vector2(1, 0); break;
case 2: contact.normal = Vector2(0, -1); break;
case 3: contact.normal = Vector2(0, 1); break;
default: assert(false); break;
}

contact.normal = (rotB * contact.normal).Normalized();
contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
return true;
}
}

Finally, we’ll consider the most complicated of the 3 collision functions we’re discussing, the box/box test. We use a technique called the separating axis test (SAT) for this one. A full explanation of the technique is beyond the scope of this specific post, but there is lots of good info on the web about this. If there is a desire expressed, I can put together a detailed discussion of it in a future blog post. I’ll summarize it as the following for the purpose of this discussion:

The projections of two objects on any axis can be considered, and if there is any separation between the projections along that axis, then the two objects cannot be intersecting.

The key to using this technique is to consider all of the relevant axes (and no more), and early out as soon as a separation is found. If no separation is found after considering the important needed axes, the axis with the least penetration provides the collision info. So, which axes do we test? Well, for 2D this is easy, you test the axes defined by the normals of the faces of each box. We exploit the fact that boxes are parallelograms, so the normals on the left & right side are just inverses for example. So, that’s 2 directions (in world space) per box shape. For finding the projections, we need the min and max points along that axis for each object, and we consider the intervals those min/max pairs form. For this, we’ll briefly introduce a concept called a support mapping. I’ll get into a lot more detail about this later when we discuss other more advanced collision detection methods, but for now I’ll define it simply as a function (defined for a particular shape) that given a direction vector returns the most extreme point (support point) on the shape along that direction.

The algorithm then goes like this: For each of the axes being considered, find the min and max points of each object along that axis. If the intervals don’t overlap, exit the function, there is no collision. While doing this, keep track of the axis with the least overlap. If you get through all axes without a separation condition, use the axis with least overlap to form the contact data for the collision. Here’s the function:

static Vector2 SupportMapping(const BoxShape* box, const Vector2& localDir)
{
Vector2 half = 0.5f * box->Size();
return Vector2(
localDir.x >= 0 ? half.x : -half.x,
localDir.y >= 0 ? half.y : -half.y
);
}

bool CollideBoxBox(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
const BoxShape* shape1 = (const BoxShape*)body1->GetShape();
const BoxShape* shape2 = (const BoxShape*)body2->GetShape();

Matrix2 rot1(body1->Rotation());
Matrix2 rot2(body2->Rotation());
Matrix2 invRot1(rot1.Transposed());
Matrix2 invRot2(rot2.Transposed());
Vector2 toBody2 = body2->Position() - body1->Position();

float minPen = FLT_MAX;
Vector2 normal, pointOn1;

Vector2 dirs1[] =
{
rot1.col1,
rot1.col2,
};

float bounds1[] =
{
shape1->Size().x * 0.5f,
shape1->Size().y * 0.5f,
};

for (int i = 0; i < _countof(dirs1); ++i)
{
Vector2 v = dirs1[i].Normalized();
Vector2 pt = toBody2 + rot2 * SupportMapping(shape2, invRot2 * -v);
float d = Dot(pt, v);
float pen = bounds1[i] - d;
if (pen < 0)
{
return false;
}
if (pen < minPen)
{
minPen = pen;
normal = -v;
pointOn1 = pt + normal * pen;
}

// Other way
pt = toBody2 + rot2 * SupportMapping(shape2, invRot2 * v);
d = Dot(pt, -v);
pen = bounds1[i] - d;
if (pen < 0)
{
return false;
}
if (pen < minPen)
{
minPen = pen;
normal = v;
pointOn1 = pt + normal * pen;
}
}

Vector2 dirs2[] =
{
rot2.col1,
rot2.col2,
};

float bounds2[] =
{
shape2->Size().x * 0.5f,
shape2->Size().y * 0.5f,
};

for (int i = 0; i < _countof(dirs2); ++i)
{
Vector2 v = dirs2[i].Normalized();
Vector2 pt = -toBody2 + rot1 * SupportMapping(shape1, invRot1 * -v);
float d = Dot(pt, v);
float pen = bounds2[i] - d;
if (pen < 0)
{
return false;
}
if (pen < minPen)
{
minPen = pen;
normal = v;
pointOn1 = pt + toBody2;
}

// Other way
pt = -toBody2 + rot1 * SupportMapping(shape1, invRot1 * v);
d = Dot(pt, -v);
pen = bounds2[i] - d;
if (pen < 0)
{
return false;
}
if (pen < minPen)
{
minPen = pen;
normal = -v;
pointOn1 = pt + toBody2;
}
}

contact.normal = normal;
contact.distance = -minPen;
contact.worldPosition = body1->Position() + pointOn1;
return true;
}

That wraps up the basics of shapes & collision detection for now. This was a VERY brief overview of the system, so please refer to the code in the repository for more information and to play around with it. We’ll move onto discussing rotations and resolving the actual collisions next!

## Building a 2D physics engine, Part 1 – Basic motion

Let’s kick off this series with some basics, and walk through the design & approach used. I’m assuming that everyone is familiar with the math concepts I’ve covered in the past, so I won’t dwell too heavily on the details of the math or solving the equations, except where it’s new. I will, however, introduce most concepts in their mathematical form first so we understand the relationships between various quantities, and then discuss how they appear in code (which is often much simpler than the math version). But first, let’s start with an overview of the physics engine we’re building.

We’ll be modeling rigid bodies, which are the individual objects that interact in the world. Some of them move about, while others are stationary (think the walls or floors of a level). Each has some shape, mass, and other physical properties, and they cannot pass through each other. They are called rigid since their individual shape & composition doesn’t change over time (versus soft bodies, such as cloth, fluids, gel, etc… which are deformable). Together, these bodies make up a simulation system. The simulation system is just the grouping of these objects along with some additional properties such as gravity and time. The simulation is the component that advances time and updates the bodies.

As each object moves about, it may encounter other objects. These encounters may result in a collision, which will need to be resolved in a realistic manner. This may mean the objects bounce off of each other, affecting their velocities. It might mean an object slides down a slope, affected by friction. It could be many things, and we’ll model a basic system that can be used to simulate several interesting effects. We will have code to detect such collisions, and return some details in the form of contact data. The contact data will then be used to resolve the collision in some way.

For this simple tutorial, we’ll be focused on basic concepts, and so I’m going to keep the code and the algorithms very naïve, avoiding obfuscating optimizations. When we move on to the 3D physics engine, since the basics will have already been covered, we’ll explore and tackle more advanced topics like broad phase pair detection, more exotic collision detection methods & contact point generation, continuous collision detection, time of impact, joints, and so on.

#### Linear Newtonian Dynamics

The first thing we need to understand, before any of the cool stuff we just talked about can happen, is how to model and move our objects. We will use Newton’s basic laws of motion, which may sound familiar from high school physics courses, to model our simulation.

First, let’s discuss the individual objects themselves. If we assume our object is just a particle with no shape or orientation, we can model it simply as a position with a mass. This position, since we’re doing a 2D physics engine, is just a 2D point. Let’s write it as p. It’s written in bold since it’s a vector quantity with 2 components, x and y. The mass is a simple scalar, let’s call it m. In code, this might look like:

Vector2 position;
float mass;


We’ll cover shapes in the next post, and rotation & angular acceleration in a later post. Therefore, for the remainder of this discussion we’ll assume we’re talking about a simple particle with a mass.

We know that moving objects have velocity. What is velocity? Well, it describes the change in position over time. The rate of change of one variable as a result of a change of another variable is the core principle of Calculus, and is called a derivative. Stated using Calculus terminology, we’d say that velocity is the first derivative of position with respect to time. This means that it is a measure of how much the position changes as a result of a change in time. This is often written as:

$\boldsymbol{v}=\boldsymbol{p'}$

The prime mark (‘) on the p represents first derivative with respect to time. This says the velocity, v, is the first derivative of position p with respect to time. So, if we have a velocity and a change in time, how do we determine how much it changes the position? We need to take the reverse of the derivative in order to solve it. The reverse of a derivative is called an anti-derivative, or integral. Finding and using the integral to solve for change in position in this way is called integration.

The integral for the velocity above would be written as:

$\int \boldsymbol(v)dt$

Which means the integral of the velocity, for a change in time of dt. dt is sometimes written as ∆t to mean “change in time” or “delta time”. Solving the above equation yields:

$\boldsymbol{p}=\boldsymbol{v}\Delta t + \boldsymbol{c}$

That is, the new position is equal to the velocity multiplied by the change in time, added to some constant c. The constant, in this case, is the starting position before the change in time occurred. So:

$\boldsymbol{p}=\boldsymbol{v}\Delta t + {\boldsymbol{p}}_{0}$

Armed with this, we can now write code to update our position, given the velocity and a change in time:

float dt; // Change in time this frame, in seconds
Vector2 velocity;

position = velocity * dt + position;


Or, more simply:

position += velocity * dt;


Great, so we know how to update our position for a given change in time, if we have a velocity. How do we update our velocity? Well, again we can state this as the first derivative of velocity with respect to time. We call this quantity acceleration.

$\boldsymbol{a}=\boldsymbol{v'}=\boldsymbol{p''}$

That is, acceleration is equal to the first derivative of velocity (with respect to time), or the second derivative of position. We integrate the same way as above to update the velocity using acceleration:

$\boldsymbol{v}=\boldsymbol{a}\Delta t + {\boldsymbol{v}}_{0}$

And in code form:

Vector2 acceleration;

velocity += acceleration * dt;


So, given an acceleration, we can update our velocity. And given that updated velocity, we can update our position. The last piece remaining then, is to find a way to generate accelerations. Of course, we could assign them directly, but it’s hard to know exactly what acceleration should be applied to something. One exception to this is gravity, since we know that gravity is an equal acceleration felt by all objects, so we can just apply that one directly. In the real world, accelerations are the result of another concept called forces, and by modeling forces for the rest of our interactions, we get more realistic accelerations as a result.

The 3 laws of motion from Newton state:

1. An object remains at rest, or continues to move at constant velocity, until acted on by an external force.
2. The amount of acceleration affecting an object is proportional to the force applied divided by the object’s mass.
3. When an object exerts a force on another object, the other object exerts and equivalent force in the opposite direction back.

We can use the second law above to come up with this equation:

$\boldsymbol{F}=m\boldsymbol{a}$

Rearranged to:

$\boldsymbol{a}=\frac{\boldsymbol{F}}{m}$

This means that given a force on an object, we can determine exactly what acceleration is generated based on the object’s mass. We end up encountering a division by mass often enough, that it’s usually best to just store the inverse of the mass directly so we can just multiply later. We now have all the basic equations we need in order to do simple simulations of our objects.

#### The code

As a reminder, all of the source code for this tutorial is maintained in this github repository. The code is not yet complete at the time of this writing, but it will always be at least up to or ahead of the current blog post being published. Meaning, it has everything in this blog post (and more) in it, but it may not be the final code with all of the features until we actually get through those posts. Therefore, if you run it and it doesn’t look complete or contains stability issues, this is the reason. Please feel free to report any bugs you find in the issue tracker on Github, or via comments to the posts. If it’s not something I’m already aware of and fixing, I’ll be sure to get it fixed soon.

The sample is broken up into the following components:

1. Main.cpp – Boilerplate Windows application level stuff, like implementing the main entry point and starting up a sample PhysicsWorld instance, throwing some objects into it, and updating it each frame.
2. DebugRenderer*.* – A simple debug rendering facility to render shapes with. This is used to render some simple shapes representing the objects in the simulation, and some debug features like contact points, etc…
3. Vector2.h, Matrix2.h – Minimal math code we need to implement this sample.
4. PhysicsWorld.h/cpp – Implements a single simulation. This allows configuring and inserting objects into it, setting global attributes like gravity and iteration counts, and updating the simulation each frame with an update call.
5. RigidBody.h/cpp – The rigid body objects we’re simulating. This is the object that contains the properties we discussed in this blog post, for instance.
6. RigidBodyPair.h/cpp – Represents a pair of 2 interacting bodies. As we’ll see in later posts, this will track the contact (if any) between two objects.
7. Shape.h – Shape contains the implementations of the various shapes we can simulate in this sample.
Collision.cpp – Implements the simple collision detection routines we use for this sample.

In this post, we encountered the following pieces:

From RigidBody.h:

// A rigid body is a dynamic object which can be affected by forces and constraints
class RigidBody
{
public:
// The rigid body takes over the shape's lifetime.
// When the rigid body is destroyed, it will delete the shape
RigidBody(Shape* shape, float mass);
~RigidBody();

const Shape* GetShape() const { return _shape; }

const Vector2& Position() const { return _position; }
Vector2& Position() { return _position; }

const Vector2& LinearVelocity() const { return _linearVelocity; }
Vector2& LinearVelocity() { return _linearVelocity; }

const Vector2& Force() const { return _force; }
Vector2& Force() { return _force; }

const float Mass() const { return _mass; }
const float InvMass() const { return _invMass; }

const float Rotation() const { return _rotation; }
float& Rotation() { return _rotation; }

const float AngularVelocity() const { return _angularVelocity; }
float& AngularVelocity() { return _angularVelocity; }

const float Torque() const { return _torque; }
float& Torque() { return _torque; }

const float I() const { return _I; }
const float InvI() const { return _invI; }

private:
Shape* _shape;

// Linear
Vector2 _position;
Vector2 _linearVelocity;
Vector2 _force;
float _mass, _invMass;

// Angular
float _rotation;
float _angularVelocity;
float _torque;
float _I, _invI;
};


From PhysicsWorld.cpp:

void PhysicsWorld::Update(float dt)
{
float invDt = dt > 0.0f ? 1.0f / dt : 0.0f;

// Determine overlapping bodies and update contact points
UpdatePairs();

// Integrate forces to obtain updated velocities
for (auto& body : _bodies)
{
if (body->InvMass() == 0.0f)
continue;

body->LinearVelocity() += dt * (_gravity + body->InvMass() * body->Force());
body->AngularVelocity() += dt * body->InvI() * body->Torque();

// Dampen the velocities to simulate friction (we'll add friction simulation later)
static const float DampeningTerm = 0.001f;
body->LinearVelocity() += -body->LinearVelocity().Normalized() * DampeningTerm;
body->AngularVelocity() += (body->AngularVelocity() > 0) ? -DampeningTerm : DampeningTerm;

// Clamp to 0 if the value becomes too low
static const float ClampThreshold = 0.01f;
if (body->LinearVelocity().LengthSq() < ClampThreshold)
{
body->LinearVelocity() = Vector2(0, 0);
}
if (fabsf(body->AngularVelocity()) < ClampThreshold)
{
body->AngularVelocity() = 0.0f;
}
}

// Do all one time init for the pairs
for (auto& pair : _pairs)
{
pair.second.PreSolve(invDt);
}

// Sequential Impulse (SI) loop. See Erin Catto's GDC slides for SI info
for (int i = 0; i < _maxIterations; ++i)
{
for (auto& pair : _pairs)
{
pair.second.Solve();
}
}

// Integrate new velocities to obtain final state vector (position, rotation).
// Also clear out any forces in preparation for the next frame
for (auto& body : _bodies)
{
if (body->InvMass() == 0.0f)
continue;

body->Position() += dt * body->LinearVelocity();
body->Rotation() += dt * body->AngularVelocity();

body->Force() = Vector2(0, 0);
body->Torque() = 0.0f;
}
}


For now, we can ignore the non-highlighted portions, which implement angular motion, collision detection, and collision response. Other than some artificial placeholder drag (we’ll implement friction later), the code implements exactly what we discussed in the post.

That’s it for this post. I want to keep each post scoped to make them easy to follow. If you have questions or want me to drill into something deeper, please just let me know in the comments. In the next post, we’ll start to discuss shapes and basic collision detection!

## Building a 2D physics engine, Part 0 – Introduction

When I first started this blog several years ago, one of the goals was to put together a detailed walkthrough or tutorial for building a simple physics engine. Unfortunately, it’s now several years later and that tutorial hadn’t shown up, and for that I apologize. Now that I’ve started getting settled at my new job, I’m finally putting this article series together and hope that you still find it useful.

Before we dive in, there are a few things that need mentioning:

There is no single correct way to build a real-time physics engine. Each year, new papers and presentations surface with different variations and techniques for modeling and solving constraints, handling collision detection, modeling soft body interactions, and more. This is an active area of research and development. Not every application requires the same features, accuracy, or level of performance. This too affects the types of physics engines out there. Some are much simpler, easier to understand, and while they might be unable to handle the performance demands of a modern AAA game with heavy physics, are often more than sufficient for simpler games and other uses. Other engines are more exotic, require much more technical understanding to follow, but can stand up to the demands of even the most complex games of the current generation.

The physics we’ll explore in this blog will start simple. The aim is to provide the foundation to build more complex understanding on top of it. It will implement very straightforward algorithms in the most naïve way possible, avoiding overly fancy tricks and cleverness. With these basics, I hope to help readers achieve fun physics simulations that can be used for a wide variety of games, and the starting point for the ambitious to build and move forward towards developing something scalable and robust to handle next generation content. I will do my best to break each concept down into simple understandable pieces, and will keep each article small and focused, providing a chance for you to ask questions and start a discussion in the comments. Discussion is really the best way to learn, in my experience, and will help other readers as well.

Lastly, if you’re a bit rusty on basic linear algebra and game math, you may want to read through my refresher series on my blog. I’m going to assume readers are familiar with the topics I covered.

## The sample code

As I write the articles, I will be building a reference implementation, called SamplePhysics2D.

The code in the repository may be further ahead than the articles, as I will be developing each portion first, then borrowing code snippets from it to write the posts. Ideally, I expect readers to either type out or copy the code snippets from the articles themselves to build out the samples, rather than just downloading the code. I feel this will lead to a better understanding of the material, but ultimately it’s up to each reader and how they learn best.

My personal development environment is Visual Studio 2013 on Windows, and that is what the project in the repository is based on. However, other than the placeholder graphics code for debugging purposes, the code should be fairly portable. On the other hand, I make no explicit guarantees about portability. If you have problems, ask in the comments and I will try my best to assist.

## Design considerations

The engine being developed throughout this tutorial is going to be very basic, meaning it won’t handle many robustness and scalability aspects that production AAA games would. I will, however, comment on those areas and point out where a more robust solution would be needed. In later posts, I’ll help fill in some of those details, but I want to keep the core of the tutorial easy to follow.

Since the focus is physics and collision detection, a very trivial renderer will be provided just to get basic visualization on the screen. This renderer is not meant to be demonstrative of anything I would recommend using, and it is not meant to be sample code anyone borrows. It’s been put together as quickly as possible to cover the basic needs that we have to visualize the physics logic, and is nothing fancier than that.

For the code itself, I will be using C++ for the sample code and project. I prefer the STL and some C++ 11 features, and use them throughout the code base. I realize not every reader will like this decision, but the code should be easy to port if one would like to make a different version.

Also, my focus is on readability and simplicity. I will comment heavily, and I will avoid overly clever tricks if it obfuscates the code. I will prefer simplicity over performance, because the point of this code is to convey information to readers, not run as fast as possible. Later, after the basic tutorial, I may dive deeper into particular tech topics. At that time, I will explain acceleration techniques and algorithms.

Lastly, I’m not much of a style stickler. I’ve tried to use as neutral coding style as possible, but I’m sure someone will still disagree with it. Sorry if it’s not your cup of tea, but try and follow along and ignore style if you can. If there’s anything truly atrocious you would like to see me change throughout the course of the series, feel free to mention it in the comments.

For reference, here’re the articles I have planned. I will enable each link to the article as they get written and posted:

1. Basics of Newtonian dynamics (linear).
2. Shapes, and basic collision detection.
3. Additional shapes and more complex contact data.
4. Torque and rotation.
5. Jitter. Persistent manifolds for stability.
6. Tangent forces and friction.

## Github repository for the blog

I’ve created a github repository to host code accompanying this blog, located at https://github.com/rezanour/blog. I will be (finally!) starting a step by step walkthrough to building a basic physics engine, and the code samples will be coming directly from a sample engine I’ll be building in the repository. That way, the entire codebase will be available for anyone to access. We’ll start with a simpler 2D example engine first, and then move to 3D after we’ve covered all of the basics.

Look for the upcoming physics series soon!

## Common gotchas when adding VR support to existing rendering engines

Many virtual reality games, at least in the short term, will be based on current rendering systems on the market today. On the one hand, this is a good thing in that it allows people to create compelling content quickly. On the other hand, these rendering systems weren’t built with VR in mind, and there are several challenges in getting the best performance and experience out of them. Through talking with people trying to integrate VR into their own rendering stacks, and through experimenting with different techniques myself, I’ve started gathering a list of gotchas that will hopefully help others as we move into this new realm.

# Rendering stereo efficiently

The first issue is how to generate stereoscopic rendering efficiently. The easiest “bolt on” solution is to just invoke your rendering system twice, once per eye & viewport. However, there are a bunch of problems with this. Rendering a scene normally involves many different GPU pipeline states, switching between them as you switch between different materials, rendering techniques, etc… These state switches are expensive, and you end up having to do all of them again for the second eye. For example, this means a scene that has 100 different pipeline states now ends up switching between 200.

It is much, much, cheaper to switch a viewport and update a constant (view matrix) and redraw the same vertices using the same pipeline state that’s already loaded. This means if you pair up your rendering of objects and do both eyes back to back, per object, you get a very significant savings in performance. You only switch between the pipeline states as much as necessary, same as with a single eye.

Let’s look at an example to really make it clear what we mean:

Imagine your scene is made up of a few objects: a large piece of geometry representing a room rendered with pipeline state A, and a few skinned characters all rendered with pipeline state B, and a fire in the fireplace rendered with pipeline state C.

Traditionally, this would be rendered something like this, which is 3 state changes:

Set A, draw room. Set B, draw characters, Set C, draw fire.

The naïve, but easy, approach to rendering this scene in stereo is to do this:

Setup camera for eye 1, Set A, draw room. Set B, draw characters. Set C, draw fire.

Setup camera for eye 2, Set A, draw room. Set B, draw characters. Set C, draw fire.

That’s now 6 state changes, and if the vertices for the entire scene don’t all fit in VRAM, then we’ve possibly incurred a bit of paging activity as well as we rotate through all the data binding & unbinding it.

Here’s what a more optimal approach may look like:

Set A, setup eye 1, draw room, setup eye 2, draw room.

Set B, setup eye 1 draw characters, setup eye 2, draw characters.

Set C, setup eye 1 draw fire, setup eye 2, draw fire.

Here, we still only go through 3 state changes and we continue to leverage the vertices as they are already bound. This makes a tremendous difference in more complicated scenes. The only drawback is that you now have to pass enough information throughout your rendering pipeline to know how to setup both eyes & viewports, as opposed to just a single camera.

# Visibility culling

It may be tempting to just treat each eye as its own individual camera, but this poses several problems. The first is that usually you determine which objects to render based on the camera. While it would certainly work to run the visibility determination logic and generate a list of objects to render per eye, it’s also incredibly inefficient. The vast majority of the two lists would be the same, with only a few items unique to one eye or the other. It is usually much better to build a “fattened” view frustum which encompasses both eyes, and use that to do a single visibility culling pass. This list generated should then be used for both eyes. There will sometimes be a handful of objects that may have only been visible out of the corner of 1 eye, and will not generate any pixels for the other eye, but these are corner cases and conservative. This will generate the proper image, and saves a tremendous amount of processing time on the CPU.

# Level of Detail (LOD)

Similar to visibility determination, determining LOD levels on a per-camera basis is the norm. However, this causes problems if you pick LOD on a per eye basis. An object that is out on the far left or far right of view might be on the boundary of two detail levels based on your current position. Such that one eye would pick level X, and the other eye level X + 1. If you have good blending between LODs, this may not be very obvious, but it’s still undesirable. The solution, again, is to just use a single fattened camera view with a single average position of the two eyes to do the LOD determination, and then use the results for both eyes.

# Normal maps

Through my experimentation, it appears something about rendering an image in stereo makes the effect of normal maps diminish significantly. Normal maps occupying a small amount of view space (such as small items in the distance) look fine, but large walls and surfaces up close really start looking quite flat and cheesy with normal maps looking like their just cheaply painted onto the surface (which is in fact, exactly what they are J). I’m not sure why, but I suspect it has to do with the discrepancy between the true depth of the surface your eyes perceive and the fake perturbations the normal map is trying to impose. I believe actual displacement techniques, such as displacement maps, and tessellation will start becoming more popular as build more rich VR experiences. I also hope this means we can push on the GPU venders to finally fix their tessellation engines, as most have performance issues and inefficiencies.

# View dependent lighting

It seems logical that view dependent computations, such as that for specular highlights or reflections, would benefit from using a per eye view vector. However, in practice this actually looks odd and I’m not sure why. In my experience, using separate view vectors per eye for rendering specular highlights makes the final lighting or reflected image look blurry and incorrect. Using a single view vector from the average or “middle” eye generally seems to look much better, even if it feels physically incorrect. I need to do some more research and experimentation here to see what’s going on, but for now my recommendation is to use a single vector to get a better looking output. But certainly try both and see which works best for you particular setup.

That’s it for now, I wanted to keep this post fairly short and just summarize a few things that may help people building content for VR. I’ll continue to document additional issues as I run into them, or hear about others running into them in the future. Good luck on getting your content on to the VR platform, and please let me know if you have additional questions or comments.

## New beginnings

After nearly a decade, last week was my last week at Microsoft. I’ve decided to join Oculus VR and help them make consumer virtual reality successful.

The reason I learned graphics and game programming in the first place was to build deeply immersive virtual worlds. The natural next step is VR, and I believe we’re much closer to having compelling consumer virtual reality than many think.

Naturally, my blog posts (yes, I will start writing more again!) will start containing more VR subjects in addition to other topics. Hopefully everyone’s as excited about the area as I am.

Here’re some planned posts I’d like to write soon:
1. Common gotchas when adding VR support to an existing graphics engine not written with VR in mind.
2. Deep dive into the Windows Flip & Present models. Breakdown of the different DXGI swapchain flip modes, and what actually happens when you call Present, including some more advanced topics and debugging tips. I’ll also talk about the tradeoff between latency & throughput that picking different models causes.

## Math Primer Series: Rotation Representations and Quaternions

Migrated from http://blogs.msdn.com/b/rezanour

We saw the power and usefulness of transforms in the previous primer posts, and we’ll be using them quite a bit in our discussions of physics and graphics. One of the first tasks we’ll need to do is determine how to represent our transformations in code. We’re going to be creating, manipulating, and animating these transformations over time, so they must be easy and efficient to work with. Particularly, we’d like to be able to manipulate the individual components (translation, rotation, and scale) independently. One natural, but naïve, choice is to just use the transformation matrix directly, since that’s the form we’ve seen so far. However, using a matrix directly makes it nontrivial to manipulate and animate. It’s also difficult to retrieve and change independent components of the transform. It’s obvious that we want something better, so let’s take a look at some options.

The next natural choice is to represent translation, rotation, and scale separately. This makes it trivial to manipulate the values independently. However, since most graphics packages require the final transformation as a matrix, we’ll need to combine the elements later. This isn’t very difficult to do, and we’ve already seen how we can do that in previous posts. Translation and scale are trivially represented as 3 element vectors, which maps well to how we use them, including combining them with matrices. But what about rotation? How can we represent that?

#### Rotation Matrix

One way to represent the rotation is to just leave it as a pure orthonormal rotation matrix. It is easy to combine this form with position and scale, since the rotation is already in matrix form. We can add more rotation to this matrix by concatenating other rotation matrices using matrix multiplication. This is somewhat expensive, but conceptually simple.

However, as more and more rotations are applied, our matrix could start to stray from being orthonormal (due to rounding errors, etc…). If we stray far enough away from being orthonormal, we can no longer be considered a rotation, and applying this matrix to an object may have unexpected side effects like shearing or deformation. Therefore, we need to regularly check and ensure that our matrix maintains the orthonormal property, which is another somewhat expensive task to do regularly.

Finally, we still have the problem of requiring 9 floats (16 if you only support 4×4 matrices) to store the data. Despite all this, rotation matrices are still a common form of storing rotations.

#### Euler Angles

Another common way to represent rotation is by using 3 angles (called the Euler angles) which represent a rotation around the x, y, and z axes. This representation only requires 3 floating point numbers to store, and allows us to easily adjust rotations around each axis independently. When the axes of rotation are the local coordinate axes of an object, the rotations are sometimes called yaw (rotation about y axis), pitch (rotation about x-axis), and roll (rotation about z axis).

While this scheme sounds simple to implement, it has some serious draw backs. First of all, in which order do you apply the rotations? Do you first rotate about x, then about the y, then about the z axis? Or perhaps a different order? You’ll notice that the result is different for each combination. There is also a condition called gimbal lock which can occur when two axis are made collinear as you rotate. For instance, if we first rotate about the y axis, and this brings our  z axis in line with where our x axis would have been, then the z rotation would look more like an x rotation, and would likely not give us the result we’re looking for.

Even with these issues, Euler angles are a common way of representing rotations. They are certainly intuitive, which is likely why they are one of the most common representations in 3D modeling tools, game and level editors, CAD programs, and many other software applications where reading and typing in rotation values is necessary. But for our internal representation in code, we can probably do better.

#### Axis-Angle

The axis and angle are normally stored as a single 3 element vector. The direction of the vector represents the axis of rotation, and the magnitude of the vector is equal to the angle of rotation. This keeps the storage requirements minimal, requiring only 3 floats. Unfortunately, adding two rotations in this form is not a simple task. You can’t just add the two vectors together, as rotations aren’t really vectors. This difficulty of combining them makes these less ideal for use in games, as one would normally convert them to another form, combine them, and then convert back.

#### Quaternion

The final representation for rotations that we’ll consider is the quaternion. Quaternions are a set of numbers (usually said to be in the space H) and are an extension of the complex numbers, and they have a lot of uses in mathematics. For our purposes, however, we are only concerned with unit quaternions. A unit quaternion can be used to represent rotations. For a complete discussion of how unit quaternions relate to rotation, and how you can visualize them using a hypersphere, refer to this wiki page.

Before we get into the mathematics and operations of quaternions, let’s examine why they are a better choice than the options we’ve previously discussed. Firstly, they require only 4 floats to store, which makes them much more compact than a rotation matrix, and pretty close in footprint to the other representations. Secondly, combining quaternions is far simpler than for axis angle, and they represent a continuous space so they have no risk of gimbal lock like the Euler angles. While it’s true that we have to normalize quaternions often to prevent rounding error, the normalization process is simple and much more efficient than orthonormalizing a rotation matrix. Lastly, converting to matrix representation is simple and requires no complex operations, so building our composite transform is trivial as well.

How exactly do we represent a quaternion? What do they look like? The quaternion space, H, is a 4 dimensional vector space and so we can represent quaternions using a 4 element vector. Quaternions are an extension of complex numbers, and have 1 real component and 3 imaginary components. They are written in the form w + xi, + yj + zk. To store them as a 4 element vector, we just put the exponents for i, j, and k into x, y, and z, and use w to represent the real portion.

The usual convention for writing quaternions is to write the 4 element vector with the real number first:

We can also write it as the real component and the vector component:

Unit quaternions can be thought of as a modified axis-angle (axis r, angle theta), where:

The identity quaternion is:

Quaternion addition is just like any 4 element vector:

Scalar multiplication also works as it does for vectors:

Normalizing a quaternion should also look familiar:

Quaternions have a conjugate, which is written as q*. The reciprocal or multiplicative inverse of the quaternion can be defined in terms of the conjugate, and is written as q-1:

Multiplying, or concatenating, two quaternions takes the form:

or:

It is important to note that quaternion multiplication is not commutative, and written and performed like matrix multiplication, from right to left. Finally, rotating a vector by a quaternion can be achieved by using:

where vr is the resulting rotated vector, and vq is the initial vector represented as a quaternion (w of 0).

Converting to matrix form is also very important for us, since we’ll need to do this to build our transform. Additionally, creating quaternions from a given axis and angle will prove to be much more convenient than trying to determine rotation quaternions directly. For instance, if I want to rotate an object about the y axis by theta degrees, it would be convenient to specify my rotation in those terms.

Converting from an axis v and angle theta to a quaternion, which can then get applied to the orientation, is straightforward:

Creating a matrix, which represents the quaternion rotation, takes the form of:

With all of these operations, we can now handle all of our scenarios, and we’ve met our criteria for choosing a representation for rotations. To summarize:

• Requires only 4 floating point numbers to store
• Manipulation is simplified by using axis-angle rotations as input
• Vectors are trivially rotated using the formula above
• Easily converted into matrix form for building transform
• Normalizing to maintain unit quaternion qualities is trivially performed

Given all of these properties and benefits, it’s easy to see why quaternions have become a very popular choice for games to use. As we start to explore the physics engine more in the coming posts, we’ll be using quaternions as our representation of rotation as well, and will be referring back to many of these concepts and formulas for that discussion.