Building a 2D physics engine, Part 2 – shapes and collisions

Last time we discussed the basics of linear motion. In this post, we’ll talk about the basics of assigning shapes and detecting collisions between these shapes. But before we do that, a quick aside:

It took me longer to get to this post than I had intended, and unfortunately I worry this will be the trend due to the various demands on my schedule. Therefore, reluctantly, I’m going to keep the posts more brief and in overview form so that I can produce them quicker to get the main points across. If you want a more thorough explanation of some specific portion, please let me know in the comments and I can spend time on just those parts to flesh them out more. This way, I can still get content out there quicker, but also address technical depth on-demand where it’s actually needed.

With that out of the way, let’s get back to shapes. We want to assign each object a shape, and these shapes will assume the world transform of the parent object they are a part of. For instance, if I define a circle around an object as it’s shape then move the object, I expect the circle to follow that movement as well, so that it’s consistently enclosing the object I assigned it to. Instead of duplicating the transform info to the shape, we just pass it along with the shape when desired. This keeps the shape type as simple as possible, just representing the shape in the local space of the object.

We can have different kinds of shapes, so each shape gets a Type member, letting us know what type of shape it is. We’ll start off with 2 types of shapes: circles and boxes. A circle is an appropriate shape to use for something that’s roughly the same dimensions in both horizontally and vertically, and also doesn’t change shape much under rotation. A ball is an obvious example. Boxes can be used to cover any rectangular shapes which are longer in one direction than the other, or that need to keep rigid shape as they rotate. There are many more shapes we can define to give better representations of objects, but we’ll start with these two simple ones for now. Later, when we move to a different kind of collision detection scheme (GJK), it will become trivial to add new shapes.

A circle consists of a center point and radius. If we assume that it is always located at the position of the parent object, then the radius is all we need. There are certainly interesting scenarios where the position of the object may not be it’s center of mass, and then you would want to specify the center point of the circle as a relative position offset from the parent object, but we’ll simplify in our engine to say the two must be the same.

There are several ways we can specify a box shape. We can store a min and max point, or we can store a center and two extents. We’ll opt for the latter in our code, and use the same simplification as the circle in that the center is always the position of the parent object.

Okay, so now we have some shapes, and we’ve assigned them to the objects. Now what? As the objects move around, we’ll want to detect collisions between the various shapes and determine how we should resolve those collisions. For this, we need to define something called contact info. For any given contact between two objects, at a minimum, we’ll need a representative contact point in world space (or in the relative space of the two objects, or both), a contact normal vector representing the normal on the surface of one object where the other collided into it, and finally a penetration depth of how far the two shapes are overlapping along that normal direction. There are many other parameters we could find for more advanced resolution of the contacts, but for now we’ll stick with these.

To detect a collision and determine the contact info, we define a function called Collide, which takes in 2 RigidBody objects and a ContactInfo object for output. The function returns a Boolean to let us know if there was a collision or not, and if there was, the ContactInfo parameter we passed in is filled with the result. The reason the function takes in the RigidBody type and not the shapes directly is that the RigidBody has a pointer to the shape, but also contains the position and orientation data for the object, which we’ll need to know where in the world (and at what orientation) the shape is. This function acts as a dispatcher for the actual collision detection, since we’ll currently be providing a different function to detect collisions between each combination of shape types. So, we have a Circle/Circle function, a Box/Box function, and a Circle/Box function currently.

For each pair of potentially interacting objects, we keep a data structure called the RigidBodyPair. The purpose of this structure is to hold data about the interaction of 2 bodies, including the collision data if it exists. Ideally, we’d only produce one of these pair objects when 2 bodies are in close proximity to each other (since otherwise, there’s no way they could interact), but for simplicity the current implementation just creates one for all possible pair of objects, an N^2 setup.

Let’s look at each of the collision functions in the code now, and go over the premise of each of the collision functions and how they work.

We’ll start with the simplest, the Circle/Circle test. If you imagine drawing a line from the center of one circle to the center of another, that is the closest possible distance between the two, and is the line along which a contact of the two circles must occur on if there is one. Since the distance to the surface of the circle along any line (including this one) is equal to it’s radius in that direction, that means that if the two circles were just touching along this line, then the total distance between their centers would be equal to the sum of their radii. If the two objects were intersecting, then the sum of their radii would be greater than the distance between their centers, and if they are not touching along that line then the sum of the radii would be less than the distance. Given that info, let’s take a look at the function:

bool CollideCircleCircle(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
    const CircleShape* shape1 = (const CircleShape*)body1->GetShape();
    const CircleShape* shape2 = (const CircleShape*)body2->GetShape();

    float r = shape1->Radius() + shape2->Radius();
    float r2 = r * r;

    Vector2 toBody1 = body1->Position() - body2->Position();
    float d2 = toBody1.LengthSq();

    if (d2 < r2)
    {
        contact.distance = sqrtf(d2) - r;
        contact.normal = toBody1.Normalized();
        contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
        return true;
    }

    return false;
}

 

Next, let’s consider the Box/Circle case. The idea here is to simply and do the check in the local space of the body containing the box. This makes a lot of the math simpler. To do that, we just take the box’s world transform (which describes going from it’s local space to world) and apply it’s inverse to the circle’s world space position & orientation to transform it to the local space of the box. We exploit a few things to simplify this: 1) a circle has no meaningful orientation, so we only need to transform it’s position, 2) since we know our transform is a rotation followed by a translation, we can just do the inverses of each of these in the reverse order (instead of computing a full inverse matrix), and 3) since the rotation matrix for the box is a pure rotation, it’s inverse is equal to it’s transpose, so we can again avoid a full inverse. Putting it all together, we first subtract the box’s position from the circle’s (inverse of the translation), then we multiply by the transpose of the box’s rotation (which is the inverse). This combination has us transformed back into the local space of the box, and we can proceed with the collision check.

So, how do we actually check for the collision? There are two cases to consider: The center of the circle is outside of the box, or the center of the circle is inside the box. If it’s outside, we find the closest point on the box’s surface to the center of the circle. If the distance between that point and the center of the circle is less than or equal to the radius of the circle, it’s a collision and that’s our contact point (and the side of the box that the point is on provides the normal). If the center of the circle is inside the box (obviously we know it’s a collision then for sure), then we need to find what side of the box it’s closest to, and try to push out that way using that side’s normal as the contact normal.

bool CollideCircleBox(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
    const CircleShape* shape1 = (const CircleShape*)body1->GetShape();
    const BoxShape* shape2 = (const BoxShape*)body2->GetShape();

    // Transform the circle to local space of the box (box then becomes aabb)
    Matrix2 rotB = Matrix2(body2->Rotation());
    Matrix2 invRotB = rotB.Transposed();

    Vector2 toCircle = body1->Position() - body2->Position();
    Vector2 localToCircle = invRotB * toCircle;

    Vector2 halfWidths = 0.5f * shape2->Size();

    // If the center of the sphere is outside of the aabb
    if (localToCircle.x < -halfWidths.x || localToCircle.x > halfWidths.x ||
        localToCircle.y < -halfWidths.y || localToCircle.y > halfWidths.y)
    {
        // Find closest point on box to the center of the circle.
        Vector2 closestPt = Vector2(
            localToCircle.x >= 0 ? min(halfWidths.x, localToCircle.x) : max(-halfWidths.x, localToCircle.x),
            localToCircle.y >= 0 ? min(halfWidths.y, localToCircle.y) : max(-halfWidths.y, localToCircle.y));

        Vector2 toClosest = closestPt - localToCircle;
        float d2 = toClosest.LengthSq();
        float r2 = shape1->Radius() * shape1->Radius();
        if (d2 > r2)
        {
            return false;
        }

        contact.distance = sqrtf(d2) - shape1->Radius();
        contact.normal = -(rotB * toClosest).Normalized();
        contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
        return true;
    }
    else
    {
        // Otherwise, find side we're closest to & use that
        float dists[] =
        {
            localToCircle.x - (-halfWidths.x),
            halfWidths.x - localToCircle.x,
            localToCircle.y - (-halfWidths.y),
            halfWidths.y - localToCircle.y
        };

        int iMin = 0;
        for (int i = 1; i < _countof(dists); ++i)
        {
            if (dists[i] < dists[iMin])
            {
                iMin = i;
            }
        }

        contact.distance = -(dists[iMin] + shape1->Radius());
        switch (iMin)
        {
        case 0: contact.normal = Vector2(-1, 0); break;
        case 1: contact.normal = Vector2(1, 0); break;
        case 2: contact.normal = Vector2(0, -1); break;
        case 3: contact.normal = Vector2(0, 1); break;
        default: assert(false); break;
        }

        contact.normal = (rotB * contact.normal).Normalized();
        contact.worldPosition = body1->Position() - contact.normal * shape1->Radius();
        return true;
    }
}

 

Finally, we’ll consider the most complicated of the 3 collision functions we’re discussing, the box/box test. We use a technique called the separating axis test (SAT) for this one. A full explanation of the technique is beyond the scope of this specific post, but there is lots of good info on the web about this. If there is a desire expressed, I can put together a detailed discussion of it in a future blog post. I’ll summarize it as the following for the purpose of this discussion:

The projections of two objects on any axis can be considered, and if there is any separation between the projections along that axis, then the two objects cannot be intersecting.

The key to using this technique is to consider all of the relevant axes (and no more), and early out as soon as a separation is found. If no separation is found after considering the important needed axes, the axis with the least penetration provides the collision info. So, which axes do we test? Well, for 2D this is easy, you test the axes defined by the normals of the faces of each box. We exploit the fact that boxes are parallelograms, so the normals on the left & right side are just inverses for example. So, that’s 2 directions (in world space) per box shape. For finding the projections, we need the min and max points along that axis for each object, and we consider the intervals those min/max pairs form. For this, we’ll briefly introduce a concept called a support mapping. I’ll get into a lot more detail about this later when we discuss other more advanced collision detection methods, but for now I’ll define it simply as a function (defined for a particular shape) that given a direction vector returns the most extreme point (support point) on the shape along that direction.

The algorithm then goes like this: For each of the axes being considered, find the min and max points of each object along that axis. If the intervals don’t overlap, exit the function, there is no collision. While doing this, keep track of the axis with the least overlap. If you get through all axes without a separation condition, use the axis with least overlap to form the contact data for the collision. Here’s the function:

static Vector2 SupportMapping(const BoxShape* box, const Vector2& localDir)
{
    Vector2 half = 0.5f * box->Size();
    return Vector2(
        localDir.x >= 0 ? half.x : -half.x,
        localDir.y >= 0 ? half.y : -half.y
        );
}

bool CollideBoxBox(RigidBody* body1, RigidBody* body2, ContactInfo& contact)
{
    const BoxShape* shape1 = (const BoxShape*)body1->GetShape();
    const BoxShape* shape2 = (const BoxShape*)body2->GetShape();

    Matrix2 rot1(body1->Rotation());
    Matrix2 rot2(body2->Rotation());
    Matrix2 invRot1(rot1.Transposed());
    Matrix2 invRot2(rot2.Transposed());
    Vector2 toBody2 = body2->Position() - body1->Position();

    float minPen = FLT_MAX;
    Vector2 normal, pointOn1;

    Vector2 dirs1[] =
    {
        rot1.col1,
        rot1.col2,
    };

    float bounds1[] =
    {
        shape1->Size().x * 0.5f,
        shape1->Size().y * 0.5f,
    };

    for (int i = 0; i < _countof(dirs1); ++i)
    {
        Vector2 v = dirs1[i].Normalized();
        Vector2 pt = toBody2 + rot2 * SupportMapping(shape2, invRot2 * -v);
        float d = Dot(pt, v);
        float pen = bounds1[i] - d;
        if (pen < 0)
        {
            return false;
        }
        if (pen < minPen)
        {
            minPen = pen;
            normal = -v;
            pointOn1 = pt + normal * pen;
        }

        // Other way
        pt = toBody2 + rot2 * SupportMapping(shape2, invRot2 * v);
        d = Dot(pt, -v);
        pen = bounds1[i] - d;
        if (pen < 0)
        {
            return false;
        }
        if (pen < minPen)
        {
            minPen = pen;
            normal = v;
            pointOn1 = pt + normal * pen;
        }
    }

    Vector2 dirs2[] =
    {
        rot2.col1,
        rot2.col2,
    };

    float bounds2[] =
    {
        shape2->Size().x * 0.5f,
        shape2->Size().y * 0.5f,
    };

    for (int i = 0; i < _countof(dirs2); ++i)
    {
        Vector2 v = dirs2[i].Normalized();
        Vector2 pt = -toBody2 + rot1 * SupportMapping(shape1, invRot1 * -v);
        float d = Dot(pt, v);
        float pen = bounds2[i] - d;
        if (pen < 0)
        {
            return false;
        }
        if (pen < minPen)
        {
            minPen = pen;
            normal = v;
            pointOn1 = pt + toBody2;
        }

        // Other way
        pt = -toBody2 + rot1 * SupportMapping(shape1, invRot1 * v);
        d = Dot(pt, -v);
        pen = bounds2[i] - d;
        if (pen < 0)
        {
            return false;
        }
        if (pen < minPen)
        {
            minPen = pen;
            normal = -v;
            pointOn1 = pt + toBody2;
        }
    }

    contact.normal = normal;
    contact.distance = -minPen;
    contact.worldPosition = body1->Position() + pointOn1;
    return true;
}

That wraps up the basics of shapes & collision detection for now. This was a VERY brief overview of the system, so please refer to the code in the repository for more information and to play around with it. We’ll move onto discussing rotations and resolving the actual collisions next!

Posted in Math, Physics, Physics & Collision Detection, Tutorial | Tagged , , , , | Leave a comment

Building a 2D physics engine, Part 1 – Basic motion

Let’s kick off this series with some basics, and walk through the design & approach used. I’m assuming that everyone is familiar with the math concepts I’ve covered in the past, so I won’t dwell too heavily on the details of the math or solving the equations, except where it’s new. I will, however, introduce most concepts in their mathematical form first so we understand the relationships between various quantities, and then discuss how they appear in code (which is often much simpler than the math version). But first, let’s start with an overview of the physics engine we’re building.

We’ll be modeling rigid bodies, which are the individual objects that interact in the world. Some of them move about, while others are stationary (think the walls or floors of a level). Each has some shape, mass, and other physical properties, and they cannot pass through each other. They are called rigid since their individual shape & composition doesn’t change over time (versus soft bodies, such as cloth, fluids, gel, etc… which are deformable). Together, these bodies make up a simulation system. The simulation system is just the grouping of these objects along with some additional properties such as gravity and time. The simulation is the component that advances time and updates the bodies.

As each object moves about, it may encounter other objects. These encounters may result in a collision, which will need to be resolved in a realistic manner. This may mean the objects bounce off of each other, affecting their velocities. It might mean an object slides down a slope, affected by friction. It could be many things, and we’ll model a basic system that can be used to simulate several interesting effects. We will have code to detect such collisions, and return some details in the form of contact data. The contact data will then be used to resolve the collision in some way.

For this simple tutorial, we’ll be focused on basic concepts, and so I’m going to keep the code and the algorithms very naïve, avoiding obfuscating optimizations. When we move on to the 3D physics engine, since the basics will have already been covered, we’ll explore and tackle more advanced topics like broad phase pair detection, more exotic collision detection methods & contact point generation, continuous collision detection, time of impact, joints, and so on.

Linear Newtonian Dynamics

The first thing we need to understand, before any of the cool stuff we just talked about can happen, is how to model and move our objects. We will use Newton’s basic laws of motion, which may sound familiar from high school physics courses, to model our simulation.

First, let’s discuss the individual objects themselves. If we assume our object is just a particle with no shape or orientation, we can model it simply as a position with a mass. This position, since we’re doing a 2D physics engine, is just a 2D point. Let’s write it as p. It’s written in bold since it’s a vector quantity with 2 components, x and y. The mass is a simple scalar, let’s call it m. In code, this might look like:

Vector2 position;
float mass;

We’ll cover shapes in the next post, and rotation & angular acceleration in a later post. Therefore, for the remainder of this discussion we’ll assume we’re talking about a simple particle with a mass.

We know that moving objects have velocity. What is velocity? Well, it describes the change in position over time. The rate of change of one variable as a result of a change of another variable is the core principle of Calculus, and is called a derivative. Stated using Calculus terminology, we’d say that velocity is the first derivative of position with respect to time. This means that it is a measure of how much the position changes as a result of a change in time. This is often written as:

\boldsymbol{v}=\boldsymbol{p'}

The prime mark (‘) on the p represents first derivative with respect to time. This says the velocity, v, is the first derivative of position p with respect to time. So, if we have a velocity and a change in time, how do we determine how much it changes the position? We need to take the reverse of the derivative in order to solve it. The reverse of a derivative is called an anti-derivative, or integral. Finding and using the integral to solve for change in position in this way is called integration.

The integral for the velocity above would be written as:

\int \boldsymbol(v)dt

Which means the integral of the velocity, for a change in time of dt. dt is sometimes written as ∆t to mean “change in time” or “delta time”. Solving the above equation yields:

\boldsymbol{p}=\boldsymbol{v}\Delta t + \boldsymbol{c}

That is, the new position is equal to the velocity multiplied by the change in time, added to some constant c. The constant, in this case, is the starting position before the change in time occurred. So:

\boldsymbol{p}=\boldsymbol{v}\Delta t + {\boldsymbol{p}}_{0}

Armed with this, we can now write code to update our position, given the velocity and a change in time:

float dt; // Change in time this frame, in seconds
Vector2 velocity;

position = velocity * dt + position;

Or, more simply:

position += velocity * dt;

Great, so we know how to update our position for a given change in time, if we have a velocity. How do we update our velocity? Well, again we can state this as the first derivative of velocity with respect to time. We call this quantity acceleration.

\boldsymbol{a}=\boldsymbol{v'}=\boldsymbol{p''}

That is, acceleration is equal to the first derivative of velocity (with respect to time), or the second derivative of position. We integrate the same way as above to update the velocity using acceleration:

\boldsymbol{v}=\boldsymbol{a}\Delta t + {\boldsymbol{v}}_{0}

And in code form:

Vector2 acceleration;

velocity += acceleration * dt;

So, given an acceleration, we can update our velocity. And given that updated velocity, we can update our position. The last piece remaining then, is to find a way to generate accelerations. Of course, we could assign them directly, but it’s hard to know exactly what acceleration should be applied to something. One exception to this is gravity, since we know that gravity is an equal acceleration felt by all objects, so we can just apply that one directly. In the real world, accelerations are the result of another concept called forces, and by modeling forces for the rest of our interactions, we get more realistic accelerations as a result.

The 3 laws of motion from Newton state:

  1. An object remains at rest, or continues to move at constant velocity, until acted on by an external force.
  2. The amount of acceleration affecting an object is proportional to the force applied divided by the object’s mass.
  3. When an object exerts a force on another object, the other object exerts and equivalent force in the opposite direction back.

We can use the second law above to come up with this equation:

\boldsymbol{F}=m\boldsymbol{a}

Rearranged to:

\boldsymbol{a}=\frac{\boldsymbol{F}}{m}

This means that given a force on an object, we can determine exactly what acceleration is generated based on the object’s mass. We end up encountering a division by mass often enough, that it’s usually best to just store the inverse of the mass directly so we can just multiply later. We now have all the basic equations we need in order to do simple simulations of our objects.

The code

As a reminder, all of the source code for this tutorial is maintained in this github repository. The code is not yet complete at the time of this writing, but it will always be at least up to or ahead of the current blog post being published. Meaning, it has everything in this blog post (and more) in it, but it may not be the final code with all of the features until we actually get through those posts. Therefore, if you run it and it doesn’t look complete or contains stability issues, this is the reason. Please feel free to report any bugs you find in the issue tracker on Github, or via comments to the posts. If it’s not something I’m already aware of and fixing, I’ll be sure to get it fixed soon.

The sample is broken up into the following components:

    1. Main.cpp – Boilerplate Windows application level stuff, like implementing the main entry point and starting up a sample PhysicsWorld instance, throwing some objects into it, and updating it each frame.
    2. DebugRenderer*.* – A simple debug rendering facility to render shapes with. This is used to render some simple shapes representing the objects in the simulation, and some debug features like contact points, etc…
    3. Vector2.h, Matrix2.h – Minimal math code we need to implement this sample.
    4. PhysicsWorld.h/cpp – Implements a single simulation. This allows configuring and inserting objects into it, setting global attributes like gravity and iteration counts, and updating the simulation each frame with an update call.
    5. RigidBody.h/cpp – The rigid body objects we’re simulating. This is the object that contains the properties we discussed in this blog post, for instance.
    6. RigidBodyPair.h/cpp – Represents a pair of 2 interacting bodies. As we’ll see in later posts, this will track the contact (if any) between two objects.
    7. Shape.h – Shape contains the implementations of the various shapes we can simulate in this sample.
      Collision.cpp – Implements the simple collision detection routines we use for this sample.

In this post, we encountered the following pieces:

From RigidBody.h:

// A rigid body is a dynamic object which can be affected by forces and constraints
class RigidBody
{
public:
    // The rigid body takes over the shape's lifetime.
    // When the rigid body is destroyed, it will delete the shape
    RigidBody(Shape* shape, float mass);
    ~RigidBody();

    const Shape* GetShape() const { return _shape; }

    const Vector2& Position() const { return _position; }
    Vector2& Position() { return _position; }

    const Vector2& LinearVelocity() const { return _linearVelocity; }
    Vector2& LinearVelocity() { return _linearVelocity; }

    const Vector2& Force() const { return _force; }
    Vector2& Force() { return _force; }

    const float Mass() const { return _mass; }
    const float InvMass() const { return _invMass; }

    const float Rotation() const { return _rotation; }
    float& Rotation() { return _rotation; }

    const float AngularVelocity() const { return _angularVelocity; }
    float& AngularVelocity() { return _angularVelocity; }

    const float Torque() const { return _torque; }
    float& Torque() { return _torque; }

    const float I() const { return _I; }
    const float InvI() const { return _invI; }

private:
    Shape* _shape;

    // Linear
    Vector2 _position;
    Vector2 _linearVelocity;
    Vector2 _force;
    float _mass, _invMass;

    // Angular
    float _rotation;
    float _angularVelocity;
    float _torque;
    float _I, _invI;
};

From PhysicsWorld.cpp:

void PhysicsWorld::Update(float dt)
{
    float invDt = dt > 0.0f ? 1.0f / dt : 0.0f;

    // Determine overlapping bodies and update contact points
    UpdatePairs();

    // Integrate forces to obtain updated velocities
    for (auto& body : _bodies)
    {
        if (body->InvMass() == 0.0f)
            continue;

        body->LinearVelocity() += dt * (_gravity + body->InvMass() * body->Force());
        body->AngularVelocity() += dt * body->InvI() * body->Torque();

        // Dampen the velocities to simulate friction (we'll add friction simulation later)
        static const float DampeningTerm = 0.001f;
        body->LinearVelocity() += -body->LinearVelocity().Normalized() * DampeningTerm;
        body->AngularVelocity() += (body->AngularVelocity() > 0) ? -DampeningTerm : DampeningTerm;

        // Clamp to 0 if the value becomes too low
        static const float ClampThreshold = 0.01f;
        if (body->LinearVelocity().LengthSq() < ClampThreshold)
        {
            body->LinearVelocity() = Vector2(0, 0);
        }
        if (fabsf(body->AngularVelocity()) < ClampThreshold)
        {
            body->AngularVelocity() = 0.0f;
        }
    }

    // Do all one time init for the pairs
    for (auto& pair : _pairs)
    {
        pair.second.PreSolve(invDt);
    }

    // Sequential Impulse (SI) loop. See Erin Catto's GDC slides for SI info
    for (int i = 0; i < _maxIterations; ++i)
    {
        for (auto& pair : _pairs)
        {
            pair.second.Solve();
        }
    }

    // Integrate new velocities to obtain final state vector (position, rotation).
    // Also clear out any forces in preparation for the next frame
    for (auto& body : _bodies)
    {
        if (body->InvMass() == 0.0f)
            continue;

        body->Position() += dt * body->LinearVelocity();
        body->Rotation() += dt * body->AngularVelocity();

        body->Force() = Vector2(0, 0);
        body->Torque() = 0.0f;
    }
}

For now, we can ignore the non-highlighted portions, which implement angular motion, collision detection, and collision response. Other than some artificial placeholder drag (we’ll implement friction later), the code implements exactly what we discussed in the post.

That’s it for this post. I want to keep each post scoped to make them easy to follow. If you have questions or want me to drill into something deeper, please just let me know in the comments. In the next post, we’ll start to discuss shapes and basic collision detection!

Posted in Math, Physics, Physics & Collision Detection, Tutorial | Tagged , , , , , , | Leave a comment

Building a 2D physics engine, Part 0 – Introduction

When I first started this blog several years ago, one of the goals was to put together a detailed walkthrough or tutorial for building a simple physics engine. Unfortunately, it’s now several years later and that tutorial hadn’t shown up, and for that I apologize. Now that I’ve started getting settled at my new job, I’m finally putting this article series together and hope that you still find it useful.

Before we dive in, there are a few things that need mentioning:

There is no single correct way to build a real-time physics engine. Each year, new papers and presentations surface with different variations and techniques for modeling and solving constraints, handling collision detection, modeling soft body interactions, and more. This is an active area of research and development. Not every application requires the same features, accuracy, or level of performance. This too affects the types of physics engines out there. Some are much simpler, easier to understand, and while they might be unable to handle the performance demands of a modern AAA game with heavy physics, are often more than sufficient for simpler games and other uses. Other engines are more exotic, require much more technical understanding to follow, but can stand up to the demands of even the most complex games of the current generation.

The physics we’ll explore in this blog will start simple. The aim is to provide the foundation to build more complex understanding on top of it. It will implement very straightforward algorithms in the most naïve way possible, avoiding overly fancy tricks and cleverness. With these basics, I hope to help readers achieve fun physics simulations that can be used for a wide variety of games, and the starting point for the ambitious to build and move forward towards developing something scalable and robust to handle next generation content. I will do my best to break each concept down into simple understandable pieces, and will keep each article small and focused, providing a chance for you to ask questions and start a discussion in the comments. Discussion is really the best way to learn, in my experience, and will help other readers as well.

Lastly, if you’re a bit rusty on basic linear algebra and game math, you may want to read through my refresher series on my blog. I’m going to assume readers are familiar with the topics I covered.

The sample code

As I write the articles, I will be building a reference implementation, called SamplePhysics2D.

The code in the repository may be further ahead than the articles, as I will be developing each portion first, then borrowing code snippets from it to write the posts. Ideally, I expect readers to either type out or copy the code snippets from the articles themselves to build out the samples, rather than just downloading the code. I feel this will lead to a better understanding of the material, but ultimately it’s up to each reader and how they learn best.

My personal development environment is Visual Studio 2013 on Windows, and that is what the project in the repository is based on. However, other than the placeholder graphics code for debugging purposes, the code should be fairly portable. On the other hand, I make no explicit guarantees about portability. If you have problems, ask in the comments and I will try my best to assist.

Design considerations

The engine being developed throughout this tutorial is going to be very basic, meaning it won’t handle many robustness and scalability aspects that production AAA games would. I will, however, comment on those areas and point out where a more robust solution would be needed. In later posts, I’ll help fill in some of those details, but I want to keep the core of the tutorial easy to follow.

Since the focus is physics and collision detection, a very trivial renderer will be provided just to get basic visualization on the screen. This renderer is not meant to be demonstrative of anything I would recommend using, and it is not meant to be sample code anyone borrows. It’s been put together as quickly as possible to cover the basic needs that we have to visualize the physics logic, and is nothing fancier than that.

For the code itself, I will be using C++ for the sample code and project. I prefer the STL and some C++ 11 features, and use them throughout the code base. I realize not every reader will like this decision, but the code should be easy to port if one would like to make a different version.

Also, my focus is on readability and simplicity. I will comment heavily, and I will avoid overly clever tricks if it obfuscates the code. I will prefer simplicity over performance, because the point of this code is to convey information to readers, not run as fast as possible. Later, after the basic tutorial, I may dive deeper into particular tech topics. At that time, I will explain acceleration techniques and algorithms.

Lastly, I’m not much of a style stickler. I’ve tried to use as neutral coding style as possible, but I’m sure someone will still disagree with it. Sorry if it’s not your cup of tea, but try and follow along and ignore style if you can. If there’s anything truly atrocious you would like to see me change throughout the course of the series, feel free to mention it in the comments.

For reference, here’re the articles I have planned. I will enable each link to the article as they get written and posted:

  1. Basics of Newtonian dynamics (linear).
  2. Shapes, and basic collision detection.
  3. Additional shapes and more complex contact data.
  4. Torque and rotation.
  5. Jitter. Persistent manifolds for stability.
  6. Tangent forces and friction.

 

Posted in Physics, Physics & Collision Detection, Tutorial | Tagged , , , , , , , | Leave a comment

Github repository for the blog

I’ve created a github repository to host code accompanying this blog, located at https://github.com/rezanour/blog. I will be (finally!) starting a step by step walkthrough to building a basic physics engine, and the code samples will be coming directly from a sample engine I’ll be building in the repository. That way, the entire codebase will be available for anyone to access. We’ll start with a simpler 2D example engine first, and then move to 3D after we’ve covered all of the basics.

Look for the upcoming physics series soon!

Posted in Math, Physics, Physics & Collision Detection, Tutorial | Tagged , , , , , , , , , | Leave a comment

Common gotchas when adding VR support to existing rendering engines

Many virtual reality games, at least in the short term, will be based on current rendering systems on the market today. On the one hand, this is a good thing in that it allows people to create compelling content quickly. On the other hand, these rendering systems weren’t built with VR in mind, and there are several challenges in getting the best performance and experience out of them. Through talking with people trying to integrate VR into their own rendering stacks, and through experimenting with different techniques myself, I’ve started gathering a list of gotchas that will hopefully help others as we move into this new realm.

Rendering stereo efficiently

The first issue is how to generate stereoscopic rendering efficiently. The easiest “bolt on” solution is to just invoke your rendering system twice, once per eye & viewport. However, there are a bunch of problems with this. Rendering a scene normally involves many different GPU pipeline states, switching between them as you switch between different materials, rendering techniques, etc… These state switches are expensive, and you end up having to do all of them again for the second eye. For example, this means a scene that has 100 different pipeline states now ends up switching between 200.

It is much, much, cheaper to switch a viewport and update a constant (view matrix) and redraw the same vertices using the same pipeline state that’s already loaded. This means if you pair up your rendering of objects and do both eyes back to back, per object, you get a very significant savings in performance. You only switch between the pipeline states as much as necessary, same as with a single eye.

Let’s look at an example to really make it clear what we mean:

Imagine your scene is made up of a few objects: a large piece of geometry representing a room rendered with pipeline state A, and a few skinned characters all rendered with pipeline state B, and a fire in the fireplace rendered with pipeline state C.

Traditionally, this would be rendered something like this, which is 3 state changes:

Set A, draw room. Set B, draw characters, Set C, draw fire.

The naïve, but easy, approach to rendering this scene in stereo is to do this:

Setup camera for eye 1, Set A, draw room. Set B, draw characters. Set C, draw fire.

Setup camera for eye 2, Set A, draw room. Set B, draw characters. Set C, draw fire.

 

That’s now 6 state changes, and if the vertices for the entire scene don’t all fit in VRAM, then we’ve possibly incurred a bit of paging activity as well as we rotate through all the data binding & unbinding it.

Here’s what a more optimal approach may look like:

Set A, setup eye 1, draw room, setup eye 2, draw room.

Set B, setup eye 1 draw characters, setup eye 2, draw characters.

Set C, setup eye 1 draw fire, setup eye 2, draw fire.

 

Here, we still only go through 3 state changes and we continue to leverage the vertices as they are already bound. This makes a tremendous difference in more complicated scenes. The only drawback is that you now have to pass enough information throughout your rendering pipeline to know how to setup both eyes & viewports, as opposed to just a single camera.

Visibility culling

It may be tempting to just treat each eye as its own individual camera, but this poses several problems. The first is that usually you determine which objects to render based on the camera. While it would certainly work to run the visibility determination logic and generate a list of objects to render per eye, it’s also incredibly inefficient. The vast majority of the two lists would be the same, with only a few items unique to one eye or the other. It is usually much better to build a “fattened” view frustum which encompasses both eyes, and use that to do a single visibility culling pass. This list generated should then be used for both eyes. There will sometimes be a handful of objects that may have only been visible out of the corner of 1 eye, and will not generate any pixels for the other eye, but these are corner cases and conservative. This will generate the proper image, and saves a tremendous amount of processing time on the CPU.

Level of Detail (LOD)

Similar to visibility determination, determining LOD levels on a per-camera basis is the norm. However, this causes problems if you pick LOD on a per eye basis. An object that is out on the far left or far right of view might be on the boundary of two detail levels based on your current position. Such that one eye would pick level X, and the other eye level X + 1. If you have good blending between LODs, this may not be very obvious, but it’s still undesirable. The solution, again, is to just use a single fattened camera view with a single average position of the two eyes to do the LOD determination, and then use the results for both eyes.

Normal maps

Through my experimentation, it appears something about rendering an image in stereo makes the effect of normal maps diminish significantly. Normal maps occupying a small amount of view space (such as small items in the distance) look fine, but large walls and surfaces up close really start looking quite flat and cheesy with normal maps looking like their just cheaply painted onto the surface (which is in fact, exactly what they are J). I’m not sure why, but I suspect it has to do with the discrepancy between the true depth of the surface your eyes perceive and the fake perturbations the normal map is trying to impose. I believe actual displacement techniques, such as displacement maps, and tessellation will start becoming more popular as build more rich VR experiences. I also hope this means we can push on the GPU venders to finally fix their tessellation engines, as most have performance issues and inefficiencies.

View dependent lighting

It seems logical that view dependent computations, such as that for specular highlights or reflections, would benefit from using a per eye view vector. However, in practice this actually looks odd and I’m not sure why. In my experience, using separate view vectors per eye for rendering specular highlights makes the final lighting or reflected image look blurry and incorrect. Using a single view vector from the average or “middle” eye generally seems to look much better, even if it feels physically incorrect. I need to do some more research and experimentation here to see what’s going on, but for now my recommendation is to use a single vector to get a better looking output. But certainly try both and see which works best for you particular setup.

That’s it for now, I wanted to keep this post fairly short and just summarize a few things that may help people building content for VR. I’ll continue to document additional issues as I run into them, or hear about others running into them in the future. Good luck on getting your content on to the VR platform, and please let me know if you have additional questions or comments.

Posted in Graphics, Virtual Reality | Tagged , , , , | Leave a comment

New beginnings

After nearly a decade, last week was my last week at Microsoft. I’ve decided to join Oculus VR and help them make consumer virtual reality successful.

The reason I learned graphics and game programming in the first place was to build deeply immersive virtual worlds. The natural next step is VR, and I believe we’re much closer to having compelling consumer virtual reality than many think.

Naturally, my blog posts (yes, I will start writing more again!) will start containing more VR subjects in addition to other topics. Hopefully everyone’s as excited about the area as I am.

Here’re some planned posts I’d like to write soon:
1. Common gotchas when adding VR support to an existing graphics engine not written with VR in mind.
2. Deep dive into the Windows Flip & Present models. Breakdown of the different DXGI swapchain flip modes, and what actually happens when you call Present, including some more advanced topics and debugging tips. I’ll also talk about the tradeoff between latency & throughput that picking different models causes.

Posted in Graphics | Tagged , , | Leave a comment

Math Primer Series: Rotation Representations and Quaternions

Migrated from http://blogs.msdn.com/b/rezanour

We saw the power and usefulness of transforms in the previous primer posts, and we’ll be using them quite a bit in our discussions of physics and graphics. One of the first tasks we’ll need to do is determine how to represent our transformations in code. We’re going to be creating, manipulating, and animating these transformations over time, so they must be easy and efficient to work with. Particularly, we’d like to be able to manipulate the individual components (translation, rotation, and scale) independently. One natural, but naïve, choice is to just use the transformation matrix directly, since that’s the form we’ve seen so far. However, using a matrix directly makes it nontrivial to manipulate and animate. It’s also difficult to retrieve and change independent components of the transform. It’s obvious that we want something better, so let’s take a look at some options.

The next natural choice is to represent translation, rotation, and scale separately. This makes it trivial to manipulate the values independently. However, since most graphics packages require the final transformation as a matrix, we’ll need to combine the elements later. This isn’t very difficult to do, and we’ve already seen how we can do that in previous posts. Translation and scale are trivially represented as 3 element vectors, which maps well to how we use them, including combining them with matrices. But what about rotation? How can we represent that?

Rotation Matrix

One way to represent the rotation is to just leave it as a pure orthonormal rotation matrix. It is easy to combine this form with position and scale, since the rotation is already in matrix form. We can add more rotation to this matrix by concatenating other rotation matrices using matrix multiplication. This is somewhat expensive, but conceptually simple.

However, as more and more rotations are applied, our matrix could start to stray from being orthonormal (due to rounding errors, etc…). If we stray far enough away from being orthonormal, we can no longer be considered a rotation, and applying this matrix to an object may have unexpected side effects like shearing or deformation. Therefore, we need to regularly check and ensure that our matrix maintains the orthonormal property, which is another somewhat expensive task to do regularly.

Finally, we still have the problem of requiring 9 floats (16 if you only support 4×4 matrices) to store the data. Despite all this, rotation matrices are still a common form of storing rotations.

 

Euler Angles

Another common way to represent rotation is by using 3 angles (called the Euler angles) which represent a rotation around the x, y, and z axes. This representation only requires 3 floating point numbers to store, and allows us to easily adjust rotations around each axis independently. When the axes of rotation are the local coordinate axes of an object, the rotations are sometimes called yaw (rotation about y axis), pitch (rotation about x-axis), and roll (rotation about z axis).

While this scheme sounds simple to implement, it has some serious draw backs. First of all, in which order do you apply the rotations? Do you first rotate about x, then about the y, then about the z axis? Or perhaps a different order? You’ll notice that the result is different for each combination. There is also a condition called gimbal lock which can occur when two axis are made collinear as you rotate. For instance, if we first rotate about the y axis, and this brings our  z axis in line with where our x axis would have been, then the z rotation would look more like an x rotation, and would likely not give us the result we’re looking for.

Even with these issues, Euler angles are a common way of representing rotations. They are certainly intuitive, which is likely why they are one of the most common representations in 3D modeling tools, game and level editors, CAD programs, and many other software applications where reading and typing in rotation values is necessary. But for our internal representation in code, we can probably do better.

 

Axis-Angle

For any set of rotations, the final orientation can be represented as some final axis of rotation and an angle. This is called axis-angle representation. For example, imagine that your head is at the origin, and that you are looking forward along some axis. If you rotate your head about the y axis by 90 degrees to the left, then rotate about the z axis by 90 degrees so that you are facing upward, you end up looking up and to the left, with your chin held high to your left. This final orientation could have also been achieved by using a single axis, which is at a 45 degree angle going in the up & right direction from where you started. Return your head to the starting position, and imagine that there is an axis going up and to the right. Now rotate your head about that axis and you should see that you end up in the same final location as the first pair of rotations.

The axis and angle are normally stored as a single 3 element vector. The direction of the vector represents the axis of rotation, and the magnitude of the vector is equal to the angle of rotation. This keeps the storage requirements minimal, requiring only 3 floats. Unfortunately, adding two rotations in this form is not a simple task. You can’t just add the two vectors together, as rotations aren’t really vectors. This difficulty of combining them makes these less ideal for use in games, as one would normally convert them to another form, combine them, and then convert back.

 

Quaternion

The final representation for rotations that we’ll consider is the quaternion. Quaternions are a set of numbers (usually said to be in the space H) and are an extension of the complex numbers, and they have a lot of uses in mathematics. For our purposes, however, we are only concerned with unit quaternions. A unit quaternion can be used to represent rotations. For a complete discussion of how unit quaternions relate to rotation, and how you can visualize them using a hypersphere, refer to this wiki page.

Before we get into the mathematics and operations of quaternions, let’s examine why they are a better choice than the options we’ve previously discussed. Firstly, they require only 4 floats to store, which makes them much more compact than a rotation matrix, and pretty close in footprint to the other representations. Secondly, combining quaternions is far simpler than for axis angle, and they represent a continuous space so they have no risk of gimbal lock like the Euler angles. While it’s true that we have to normalize quaternions often to prevent rounding error, the normalization process is simple and much more efficient than orthonormalizing a rotation matrix. Lastly, converting to matrix representation is simple and requires no complex operations, so building our composite transform is trivial as well.

How exactly do we represent a quaternion? What do they look like? The quaternion space, H, is a 4 dimensional vector space and so we can represent quaternions using a 4 element vector. Quaternions are an extension of complex numbers, and have 1 real component and 3 imaginary components. They are written in the form w + xi, + yj + zk. To store them as a 4 element vector, we just put the exponents for i, j, and k into x, y, and z, and use w to represent the real portion.

The usual convention for writing quaternions is to write the 4 element vector with the real number first:

clip_image004

We can also write it as the real component and the vector component:

clip_image002[34]

Unit quaternions can be thought of as a modified axis-angle (axis r, angle theta), where:

clip_image002[36]

clip_image002[39]

The identity quaternion is:

clip_image006

Quaternion addition is just like any 4 element vector:

clip_image008

Scalar multiplication also works as it does for vectors:

clip_image010

Normalizing a quaternion should also look familiar:

clip_image012

Quaternions have a conjugate, which is written as q*. The reciprocal or multiplicative inverse of the quaternion can be defined in terms of the conjugate, and is written as q-1:

clip_image002[24]

clip_image002[30]

Multiplying, or concatenating, two quaternions takes the form:

clip_image002[32]

or:

clip_image002[28]

It is important to note that quaternion multiplication is not commutative, and written and performed like matrix multiplication, from right to left. Finally, rotating a vector by a quaternion can be achieved by using:

clip_image002[16]

where vr is the resulting rotated vector, and vq is the initial vector represented as a quaternion (w of 0).

 

Converting to matrix form is also very important for us, since we’ll need to do this to build our transform. Additionally, creating quaternions from a given axis and angle will prove to be much more convenient than trying to determine rotation quaternions directly. For instance, if I want to rotate an object about the y axis by theta degrees, it would be convenient to specify my rotation in those terms.

Converting from an axis v and angle theta to a quaternion, which can then get applied to the orientation, is straightforward:

clip_image002[18]

Creating a matrix, which represents the quaternion rotation, takes the form of:

clip_image002[26]

With all of these operations, we can now handle all of our scenarios, and we’ve met our criteria for choosing a representation for rotations. To summarize:

  • Requires only 4 floating point numbers to store
  • Manipulation is simplified by using axis-angle rotations as input
  • Vectors are trivially rotated using the formula above
  • Easily converted into matrix form for building transform
  • Normalizing to maintain unit quaternion qualities is trivially performed

 

Given all of these properties and benefits, it’s easy to see why quaternions have become a very popular choice for games to use. As we start to explore the physics engine more in the coming posts, we’ll be using quaternions as our representation of rotation as well, and will be referring back to many of these concepts and formulas for that discussion.

Posted in Math | Tagged , , , , | Leave a comment

Math Primer Series: Matrices III: Affine Transformations and Matrices

Migrated from http://blogs.msdn.com/b/rezanour

We’ve seen matrices used as representations of a basis, and we’ve seen them used as linear transformations, but what else can we do with them? So far, we’ve only been able to use matrices to transform vectors from one vector space to another. What about points and vectors in an affine space? There is another class of transformation, called an affine transformation, which does exactly this.

Affine Transformations

An affine transform is a transformation which maps points and vectors in an affine space to another affine space. This allows us to transform positions now, in addition to just vectors. An affine transformation can be thought of as a combination of a basis matrix and a translation to an origin, which is analogous to affine spaces compared to vector spaces, since affine spaces add an origin. When we look at an object, we can define a right, up, and forward vector for it, relative to some frame of reference. But furthermore, we can describe it’s position in space. Not the position of a particular point on the object, but the position of the whole object. What we’re actually doing is saying that all the points on the object can be thought of relative to some origin on the object, and that origin has a location in space (relative to some frame of reference of course). This means that even though we move the object’s origin around, we’re not changing the orientation (the right, up, and forward vectors). The reverse is also true, we can rotate or scale our object all we want, but it won’t move the object’s origin. In this way, we see that the origin of the object, and it’s basis are independent, and the formal affine definition reflects this separation. Affine transformations are formally defined as multiplying a point by a matrix component (to define it’s orientation) and then adding a vector component (to define it’s origin’s position):

clip_image002[19]

Here, M is the matrix portion of the transformation, and is sometimes referred to as the linear portion. This can consist of rotation, scaling, or any other linear transformation. The t in the equation is called the translation and is the vector defining where the object’s origin is compared to some frame of reference. Let’s take a look at an example to get a better grasp of this:

image

In Figure 1, we see a frame of reference shown (origin O, with right, up, and forward defining our frame). We also see an object (the square). If we were locally sitting inside that object, then x and y would have been our right and up vectors, and P our local origin. But since we’re examining the object from a different frame of reference, it appears rotated and located at some other location. This placement and orientation can be described as a single affine transformation, as show in the equation above, with M being the linear portion and t being the translation portion. Using our example above, and assuming only rotation (to simplify), our matrix M would likely look like:

clip_image004[6]

Assuming that x, y, and z (which we can’t see in the image) are defined relative to our coordinate space and are unit length. Look back to this post if you’re unsure how we got that. We then would add our translation vector (dashed line t in the figure).

While this works out well enough, the extra addition must be remembered and there’s not a convenient way to store this information. Fortunately, there’s an alternate way to express and use affine transformations which is much more convenient and consistent with what we’ve already seen:

clip_image006[4]

Note that this is now a 4×4 matrix, as apposed to the usual 3×3 we’ve seen so far. It contains the linear part of the affine transformation in the upper left 3×3 block, and the translation vector as the last column. The bottom row contains 0s with the exception of the last entry which is a 1. Here’s an example of the expanded matrix from the previous example:

clip_image008[5]

Before we go on, I wanted to call out that I’ve been using column matrices so far in this post. The same concepts apply if you’re using row matrices, just remember the layout will be transposed from what I’m showing here, with the translation vector as the bottom row of the matrix, and the 0s as the last column, and transposing the upper left block.

Now, we claimed earlier that affine transformations could be used to transform both points and vectors from an affine space to another. Points and vectors are quite different. One is a location and one is a direction. It doesn’t make any sense to translate a direction, so how can we possibly use this matrix to transform vectors? The answer is in the fact that we have to augment a point or vector to be compatible with this matrix. This is a 4×4, and both points and vectors are 3 components. Without some augmentation, the points and vectors are not compatible with this new matrix. How we augment them depends on whether they are points or vectors, and dictates whether the translation component of the affine matrix is applied or not.

If you recall from this post, points are sometimes written as 4 component vectors with a 1 in the last component (sometimes called w). Vectors can likewise be written with 4 components, with the last component equal to 0. The reasons this worked for arithmetic were shown in the post, but we now see there is another critical reason for that distinction. When you multiply a 4 element vector with the affine matrix above, the math works out to:

clip_image010[5]

clip_image012

clip_image014

clip_image016

clip_image018

 

Note the highlighted portion of the last one there. This is exactly what you’d get using a normal 3×3 transform M multiplied by p. This means that we get the exact same result as applying the linear transformation M and then adding t times p’s w component. If the w component is 0, as in the vector case, it cancels out the translation portion. If the w component is 1, then we translate by exactly t, which is what we expect for points. Notice also that the last element of the new vector is exactly equal to the w component of the starting vector. This means that vectors stay vectors under affine transformation, and points stay points. This is pretty important, it means that an affine space will never turn points into vectors or vice versa. The augmented points and vectors which contain the w component are said to be in homogenous space, and another term often used for the affine matrix (since it requires homogenous vectors) is a homogenous matrix.

In the next and final matrix-related math primer post, we’ll examine some other operations which can be performed on matrices. Specifically, we’ll take a look at the determinant, adjunct, and inverse. Afterwards, we’ll quickly look at some of the problems with rotations, and introduce quaternions as an alternative way to express rotations which overcomes many of the traditional problems. I’m going to be continuing the physics posts as well.

Posted in Math | Tagged , , , , | Leave a comment

Math Primer Series: Matrices II: Linear Transformations

Migrated from http://blogs.msdn.com/b/rezanour

In the last installment of the math primer series, we looked at the basics of matrices. Today, we’ll take a look at using matrices for linear transformations, which are one of the most common uses of matrices in games. But, before we dive into linear transformations, let’s take a quick look at what a transformation is, and why we represent them using a matrix in the first place.

A transformation is any operation that maps values from one space to another. In the context of standard algebra, we are usually introduced to transformations in the form of functions. Functions and transformations are really one and the same, though the term transformation is more common in linear algebra. In fact, sometimes you may even see transformations or functions referred to as mappings. Again, for our purposes these are all equivalent ways of conveying the same meaning. We take in a set of values, called the domain, and we produce another set of values, called the range.

When discussing transformations of affine and vector spaces, it is quite convenient to represent the transformations in the form of a matrix. This allows for succinct notation, as well as easily combining transformations through matrix concatenation (multiplication). For example, take the following transformation:

clip_image002

Here, we say that the matrix M is multiplied with the vector x, and is represented by the transformation T. We also know, from our discussion about matrices and vectors, that if we are using column vectors and column-major matrices, then the result of this transformation will be another vector. If we’re using row vectors and row-major matrices, then our result would be a matrix. If we want to use row vectors and row matrices, and still get a vector result, we would write the transformation as:

clip_image002[4]

A linear transformation is a transformation that maps from vector space to vector space, maintaining the vector addition and scalar multiplication properties of the vectors. The first part of the definition, mapping from vector space to vector space, means that our transformation matrix must be exactly n x n in size, where n is the dimension of the vector space. The number of columns must be n in order to be compatible for multiplication with the vector. The number of rows must be n to give us an n-dimensional vector as a result. For 2 dimensional space, that means a 2×2 matrix. For three dimensional space, that means a 3×3 matrix.

The second part of the definition means that the addition of two vectors, and scalar multiplication of a vector, must be maintained after the transformation. Put in equation form:

clip_image004

clip_image002[6]

The first equation says that the transformation of the sum of two vectors is equal to the sum of the transformation of the original vectors. This is what we mean when we say that the vector addition property of the vectors is maintained. The second equation says that the transformation of a scaled vector is equal to scaling the transformation of the original vector. This is what’s meant by maintaining the scalar multiplication property of vectors. Because of this, an easy way to determine if a transformation is linear is to combine the above equations and try and prove that they hold true for a particular transformation:

clip_image002[8]

If you can prove that the above does not hold true for a given transformation, then that transformation is not linear.

Next, we’ll look at two of the most common linear transformations encountered in games: scaling and rotation. In addition to describing each transformation, I’ll show that they’re linear by using the equations above, and also use each one with some real numbers plugged in as an example. Hopefully, this will make them easier to understand.

Scale

The first linear transformation we’ll look at is scaling. Three dimensional scaling can be represented with the following matrix:

clip_image002[12]

Where si represents the amount of scale along the i vector, sj along the j vector, and sk along the k vector. Solving the linear transformation equation from before gives us:

clip_image002[14]

clip_image004[4]

clip_image006

Which is what we’d expect. Now, let’s show that this is linear by plugging in the two sides of the equation for showing linearity from above and seeing that the results are equal in both cases:

clip_image002[16]

clip_image004[6]

clip_image006[4]

clip_image008

clip_image010

clip_image012

clip_image014

We can see that in either case, we ended up with the same answer. This shows that the scaling transformation is indeed a linear one.

Now, let’s look at an example you may encounter in a game. For this example, let’s say that we are trying to scale up a triangle with vertices (-1, 0, 0), (0, 1, 0), and (1, 0, 0) by a factor of 3 in all directions. I’ve intentionally used simple numbers to make this easier to visualize and show in a blog post. The same concept applies regardless of the numbers. This gives us a matrix with 3 all along the diagonal:

clip_image002[18]

To scale our triangle, we would need to multiply each vertex by the scale matrix:

clip_image002[20]

clip_image004[8]

clip_image006[6]

This gives us the 3 scaled vertices we expect: (-3, 0, 0), (0, 3, 0), and (3, 0, 0).

Rotation

The other linear transformation we’ll look at is rotation. Rotation consists of an axis of rotation, and the angle of rotation. When looking directly into the axis of rotation (the axis is pointing towards you), the rotation angle is measured counterclockwise. We’ll start by examining the matrix representations of rotations about the axes of the standard basis (i, j, and k). Then, we’ll  show linearity and solve an example.

Let’s start by looking at the matrix for rotation about the i, j, and k vectors (axes). These are sometimes referred to as pitch, yaw, and roll respectively.

clip_image002[28]

clip_image004[16]

clip_image006[14]

We can then combine these through concatenation to form a general rotation matrix representing the combined rotations. However, we’ve seen that concatenation is order dependent, so we must be careful which order we combine them in. Combining pitch, then yaw, then roll may give a different result from combining as yaw, then pitch, then roll. There’s not really any standard for the order in which they’re combined, so be sure to stay consistent.

So, are rotations linear transformations? I’ve said that they are, but let’s show that this is the case. We’ll show this for one of the rotations above, but the same thing can be done for each of them:

clip_image002[30]

clip_image004[18]

clip_image002[32]

clip_image008[7]

clip_image010[5]

clip_image012[5]

clip_image004[19]

As we can see, both ways lead to the same result, which confirms that rotations are in fact linear transformations.

Finally, let’s consider an example with real numbers. Let’s say we have a vector u = (3, 4, 5). We want to rotate this vector by 45o counterclockwise about the i vector, then 30o counterclockwise about the j vector:

clip_image002[34]

clip_image004[22]

clip_image006[18]

clip_image008[9]

clip_image010[7]

clip_image012[7]

clip_image014[5]

clip_image016

clip_image018

clip_image020

clip_image022

clip_image024

And that gives us the new vector after the two rotations: (5.7795, –0.707, 2.3285).

To wrap up, we’ve discussed transformations in general and looked at a special class of transformations known as linear transformations. We’ve looked at the two most common linear transformations in video games, scale and rotation. In the next installment of the math primer series, we’ll discuss affine transformations. As always, please let me know if there’s anything that I can help to explain better.

Posted in Math | Tagged , , , | Leave a comment

Intro to Physics in Games

Migrated from http://blogs.msdn.com/b/rezanour

I’ve decided to try something a bit different and mix in physics posts while I wrap up the math primer. Many of you might be more interested in the physics, or may already be familiar with the math topics, so this gives me a chance to have something for everyone.

Today, we’ll look at some of the core concepts around physics simulations in video games. We’ll go over some terminology and approaches so that we’re all on the same page when discussing more involved topics in later posts. First off, we’ll discuss what it means to have physics in a video game, and how does that compare with physical simulation in other software fields. Then, we’ll discuss the basic steps in simulating physics in a game, and how it integrates with the rest of the game code.

 

Physics in Video Games

Video games have become incredibly complicated pieces of software over the past 20 years. I recall a time when PC games could fit on a single density 720KB 5.25” floppy disk. Several years later when CDs were initially becoming available, you could find a single CD with over 1000 games on it! These games ran at measly 320×240 resolutions with 8 or 16 colors.

Today, games easily weigh in at several gigabytes, with many being tens of gigabytes. They have nearly photo-realistic graphics with incredibly high resolutions. Along with that extra visual realism comes an expectation of immersion from the user. The point of having incredibly realistic graphics is to make the player feel  like they’re actually in the game. However, for the illusion to hold, the virtual world around them must also behave in a believable manner. The best graphics in the world wouldn’t make a player feel like they’re actually immersed if the characters move unnaturally, pass through walls or floors, or things don’t fall and topple as if they had realistic mass and material properties. These are just some of the challenges of making the world feel believable.

Physical simulation software is nothing new, and has been used for decades for scientific research and study. Even the use of physics in games isn’t entirely new, though it’s gotten a lot more attention in recent years. Even some of the earliest games, such as Pong, had very primitive physics simulations in the sense that the trajectory of the ball bouncing off the paddles was determined in a way that would seem intuitive and consistent with our understanding of physics. What is new, however, is the recent adoption of general purpose physics simulations in video games. The physics in early games was very much written for the specific scenarios of that game. For example, it could handle projecting the trajectory of a ball off of a paddle, but nothing else. This meant that for each new effect the game wanted to have, a new piece of physics code would have to be written to handle it. Additionally, a new game written would have to have new physics code written for it as well. As game worlds became more complicated, the need for more physical scenarios increased, and the cost of writing specialized functions for each scenario became too prohibitive. During this time, the concept of having a general purpose piece of code that, given some overall parameters, could simulate arbitrary physical conditions and scenarios came to be. These general purpose physics engines could allow game developers to create games with many more interesting scenarios, and do it much more quickly.

Before moving on to discussing the anatomy of these physics engines, there’s one more important point to consider. Physics simulations are just that, simulations. Physics, in general, is very complex and not possible to replicate in code. However, if we take certain core aspects or principles of physics, and simulate them with some acceptable level of accuracy, then performing this in code becomes much more reasonable. The most common class of physics simulated in games is called Rigid Body Dynamics. Rigid body simulations use Newtonian principles of movement and mass to simulate rigid objects (spheres, boxes, a tower, a boulder, a wall, etc…). In games, the physical accuracy of the simulation only needs to be high enough to be believable, and is often times reduced to improve performance. A smooth running, albeit slightly less accurate, game is much more immersive than a perfectly accurate game that runs poorly and constantly drops frames.

Basic Components of Physics Engines

There are many different physics engines out on the market today. Each one works somewhat differently than the next, and each uses different approaches to solving the general physics problem for games. However, in order to solve the problem for games, there are certain things that every engine must do. For the purposes of this post, we’ll be focusing on rigid body dynamics as it’s the most common type of physics in games. I’ll try and include posts later to discuss other types of simulations such as cloth, fluid, or gas. All of the components of the physics engine can be summarized in a single general operation called the simulation step.

Video games display many rendered frames per second. Comparing this to animation, many small in-between images are shown quickly to give the impression of movement. The de facto standard for most PC and console games is to target 60fps, which means that each frame represents about 0.01667s of time. For the physics simulation to match, only 0.01667s worth of physics should happen each frame. That means an object moving forward at a velocity of 1m/s should only move 0.01667 meters that frame, and then again the next frame, and then again, and finally after 60 frames have gone by, the object would have covered 1 meter of distance.

These intervals of moving objects along their trajectories is called the physics simulation step. Almost every physics simulation equation involves time, and the time used for solving these is this time slice determined by the game. Most engines will take in the time as a value each frame, as it could change from frame to frame. Some games use variable frame rates, and so a static number should never be assumed.

So, what exactly happens during the simulation step?

Integration

A typical physics engine normally tracks all the objects that it’s simulating in a data structure somewhere. For simplicity let’s call it a list, though this isn’t always the case. For each object, the physics engine needs to know some important information, such as it’s mass, current velocity, current position and orientation, and outside forces acting on the object. Each step, the integration portion of the physics code will solve for the new current positions and velocities for every object. This involves using equations of motion from physics to approximate the positions and velocities for an object in some future time (current time plus the time slice for the frame), using the current positions and velocities as starting points. In addition, outside forces such as gravity, springs, friction, or anything else relevant to the game are also considered to ensure the newly computed positions and velocities make sense.

Collision Detection

If there were only a single object moving in a vacuum, then we’d be done! However, most games involve more than one object, and these objects move in interesting environments. Therefore, there arises a situation where two objects are moving towards each other or end up running into each other. What happens? If we don’t do anything, these objects will just pass right through each other. The renderer certainly has no knowledge that these are separate, solid objects that shouldn’t overlap. It will happily draw them intersecting each other. In most games, however, you don’t want the objects to pass through each other. An asteroid colliding with a ship should crash into it, possibly destroying the ship. A character walking on top of a floor should stay above it, gravity shouldn’t pull the character down through the ground. In order to handle these scenarios, the game needs to know thattwo objects are overlapping. The process of identifying these scenarios is called collision detection, and is normally one of the other major tasks the physics code must do. Generally, the job of the collision detection code is to determine all such pairs of overlapping objects, possibly including some additional data such as how far they overlap and in what orientation, and providing this data to the game for further processing.

Collision Resolution

Once the physics engine has identified that a pair (or many pairs) of objects are overlapping, then what do we do? In many cases, this is something specific to the rules of the game. For instance, in a space shooter game, when an asteroid overlaps the player’s ship, the game may decide to stop drawing the player’s ship and instead draw an explosion animation. Following that, the game would probably start the level over and reduce the number of lives of the player. All of these reactions to the collision are driven entirely by the game itself, and not the physics engine. This is because they are very specific to the game in question. However, there are certainly cases where the game doesn’t care to be involved. For instance, in a first person shooter, if the player knocks over a chair in a room, the game doesn’t need to do anything specific for this case. There are no game rules invoked, and therefore the game just wants the motion of this chair and player to continue to be simulated in a realistic fashion. This likely means the chair falls over in response to being bumped by the player, and when it strikes the floor, it probably rolls or tumbles slightly. This class of reaction is another common, and arguably the most complex, job of the physics engine and is generally referred to as collision response or collision resolution.

We’ve covered the basics of all physics engines. Every simulation suitable for games will have these general components, though it’s common to include many more features such as joints, cloth simulations, physics-based animation, and other interesting things. I’ll be going into a lot more details for each component as we continue to discuss physics in games. I’m planning on presenting a sample architecture for a physics engine, and then drilling into each section piece by piece to discuss the design, why it was designed that way, what algorithms are used, what other alternate algorithms are there, and what optimizations where made and why. I’ll be providing sample code snippets and probably entire posts dedicated to certain algorithms and optimizations. Please note that the architecture used will be something of my own invention, and is not that of any other physics engine out there. While there will certainly be similarities due to the fact that we’re solving the same problem, all of the code and design presented will have been created specifically for the purpose of education on this blog.

Posted in Physics & Collision Detection | Tagged , , | Leave a comment