Aaron Lebo home code photos links @aaron_lebo rss

A WebGL renderer in 300 lines of Javascript

April 3 2018

Ali: There are guns at Aqaba.

Lawrence: They face the sea, Sherif Ali, and cannot be turned round. From the landward side, there are no guns at Aqaba.

Ali: With good reason. It cannot be approached from the landward side.

Lawrence: Certainly the Turks don’t dream of it. Aqaba is over there. It’s only a matter of going. – Lawrence of Arabia (1962)

Hey there.

It’s been a long time since I’ve written anything on this blog. I’ve got to live with my opinionated self, which gets old, so I’d just as soon not subject others to that. Too many people in this age of social media feel like their every action and half-baked hot take are worth broadcasting, like their thoughts are god’s gift to humanity. I’m under no such delusion, and you know what people say about assumptions and opinions, anyway…

I tell you this because I’ve been working hard over the last few years on several fronts, and have what I believe is some some hard won experience and knowledge that I’d like to share, as well as a decent amount of code that I think is pretty neat. Hopefully it will be useful to you. Sometimes when I get to writing I have difficulty stopping, so I apologize if this is overly long. Thanks for reading.

comanche

comanche (source) is a WebGL renderer (eventually, a proper game engine) written in Javascript that weighs in at just over 300 lines of code. It is named after those lords of the plains, the Comanche, who dominated an area centered on the Texas Panhandle from the arrival of Spanish horses until into the 1870s. As detailed by S.C. Gwynne in Empire of the Summer Moon, the Comanche were the best light calvary in the world for a time and had an uncontested empire as far north as the Arkansas River. They were a skilled and ruthless warrior society; men and women alike fierce and hard. When Texas came into the Union in 1845, almost the entirety of territory it and the US claimed to the north and west of Austin was not actually under American control. Amazingly, less than 150 years later, not only is that empire gone but Comanche culture, like many native cultures the world over, is all but dead. This homage can’t undo history, but being from a town called Yellow on those plains, it’s my small nod to them.

The end goal for comanche is something along the lines of a Minecraft voxel engine. That’s underselling it, as I’m not looking to build a Minecraft clone, but some of the techniques like procedural generation and infinite worlds really intrigue me. A voxel engine is also about the simplest 3d engine you can build, so it’s a decent starting point. Unfortunately, to get from no knowledge of 3d programming (where I started several years ago) to something like Minecraft, there’s a sizeable gap and plenty of things to learn and implement. My first step was to render a triangle (the basis for 3d programming), then a cube, then many cubes. Rendering many cubes is neat, but then I wanted to render interesting landcapes. I found a shortcut in the 25-year old game Comanche. Convenient, yeah? My little brother and I played this as well as lots of other state of the art games of the time - Subwar 2050, Wing Commander, Descent, Tie Fighter, Heretic, Dark Forces 2, Duke Nukem 3D. My dad would buy these games but we’d end up playing them. Presumably, if my mom knew about the strippers and pigs with shotguns in Duke 3D, there’s no way this would have happened.

Games of the day were doing interesting things to render 3d or pseudo-3d on very limited hardware (incredibly limited compared to today - it still amazes me that they managed to pull this off). Comanche used a unique raycasting system to combine two separate maps - one height, the other color, to render large outdoor scenes. This can best be represented as separate images, with each byte in a 1 MB image representing height or color in a 1024 by 1024 unit map (512 by 512 or any other size are of course possible). These pixels can also be thought of as voxels, so it is a rather natural extension to render these maps in 3d. After much work, I figured out how to do this, so that’s where we are: comanche 0.1 can render 29 maps reverse engineered by Sebastian Macke from the game Comanche, and can easily render any map composed of a set height/color images. I also included my own map, pampa, after the aptly-named town. It’s meant to depict a snowy day. I suppose to be more realistic, there should be a tree or two.

But I’m getting ahead of myself. My motivation for doing this came from running across Michael Fogelman’s Craft, a self-described Minecraft clone, written in C and modern OpenGL, which among other features includes multiplayer support, and perhaps most impressively, consists of just over 5,000 lines of code. Depending on what your experience is, that may sound like a lot, and at the time I’d never worked on a single project of that size myself, but in the grand scheme of projects, that’s rather minimal, especially for a low-level language like C. Other projects are easily in the hundreds of thousands of lines range, many in the millions or tens of millions. No matter how great the codebase, 100,000 lines of code is intimidating to say the least, especially for beginners. The thought that by understanding those 5,000 lines of code, I could understand how interactive multiplayer games worked was exciting.

It’s never that easy. Though Craft is very well-written code, the thing about OpenGL and graphics programming is there’s no way to say “draw a cube”, “draw a horse”, “draw the sky”, or “add lighting”. The GPU has no knowledge of such complex objects. It is your job as the programmer to feed in the vertices in space (x, y, z coordinates) and textures that together form those objects. The GPU does not even natively understand 3d. The screen, like a photograph, is 2d, it is only through math and what are essentially camera tricks that we perceive scenes as 3d. OpenGL (like DirectX and Vulkan) is the interface through which you as the programmer interact with the GPU. The trick is to understand how and why it works, which isn’t necessarily intuitive until after you’ve actually drawn a cube and had your aha moment. This gives the illusion of it being more complex than it really is, like riding a bike, raising a dog, or learning how to be an adult.

Fortunately for you and me, there are lots of resources which explain how it all works. I find this is true about many domains today: the sheer amount of free information is unbelievable. Because that work has already been done (thank you, thank you, thank you to the authors), I’ll link you to those resources and suggest you dive in if you’re interested in graphics programming. To pique that interest and give you an idea of the basics, I’ll provide a simplified explanation, as I understand it.

How graphics programming works

As stated earlier, the basis of polygons and then more complex 3d shapes is the triangle. A triangle is composed of 3 vertices, a square of 2 triangles, and a cube of 6 squares, or 12 triangles, or 36 vertices. These shapes are assembled on the CPU and then passed to the GPU through the use of buffers. Shapes and models can be programmatically assembled or imported from 3d modeling programs such as Blender via .obj files or other formats. Buffers are the primary mechanism of uploading and manipulating data on the GPU. There is also the concept of attributes, which is basically how you say “this buffer contains 108 floating point numbers, with each set of three representing a vertex”; in other words, the attributes/charateristics of the buffer. You can similarly upload color data (RGBA) via buffers. Furthermore, if you’ll consider how triangles form a square, you’ll notice that you really don’t need 6 separate vertices, the triangles can share 4 of them. OpenGL has functionality that allows you to specify the indices of shared vertices to do this (confusingly called element arrays), which is but one example of a number of perhaps unexpected features which are seemingly minor but have specific uses. Another is the way that “winding” vertices clockwise or counter-clockwise to form a shape determines what direction it is culled from (or not rendered when unnecessary). You can Google for more information, but it’s worth being aware that concepts like this exist and they exist for a reason.

The magic that makes modern 3d programming modern is the use of shader programs. Shaders are written in a small, restricted, C-like language called GLSL which runs on the GPU. Much like regular programs, they are self-contained pipelines which accept a number of inputs and return a number of outputs. Modern GPUs can run many of these shaders at once but crucially they do not share state and partially due to this are very efficient. Perhaps they are best thought of as functions which operate separately and in parallel on each vertex. Inputs consist of individual items taken from the buffers mentioned previously, as well as from so-called uniforms which are bound to the same value across every run of the shader. Textures are another usually global input, they allow you to import images used to, among other things, wrap objects. For example, this is how you might make the ground look like grass.

Shader programs are actually composed of two kinds of shaders: vertex and fragment. There are other more specialized kinds which are used less often and I’ve not yet needed. Vertex shaders most importantly return position information for each vertex and they can also send data to the fragment shader which runs next in the pipeline. The fragment shader is primarily concerned with the coloring of each vertex, as well as the interpolated coloring of the entire shape represented by the vertices (this is also where textures come in). If this sounds confusing, in practice it is less so. You can do some really amazing stuff with shaders, but in many cases the positioning and coloring of vertices is very simple, feeding that information in and out as is. Where these really shine (sorry, had to) is in the use of applied effects such as modern lighting: if we’ve got one or more light sources, we can feed the location of those sources into the shader and then calculate the lighting/shadows on each individual vertex. This is also how games do effects like trees and grass which sway in the wind; it’s just a vertex shader which adjusts the position of each vertex based on a simple algorithm to give the illusion of movement.

Wiki says that shaders were first introduced in Pixar’s RenderMan software in 1988. It was not until 2000 that Nvidia GeForce cards supported them in hardware available to consumers. John Carmack and id released Quake 3 in 1999 which required an OpenGL capable card due to the heavy use throughout the engine. I seem to recall (but can’t find via Google, so take it with a grain of salt), that Carmack’s work at id during this time heavily influenced the development of the programmable shaders we take for granted today. Even if that’s not accurate, he basically invented the first-person shooter as well as the engines which are the foundation for multiple gaming empires today. Pretty damn impressive - maybe good things can come from Dallas. The impact of shaders on the industry is easily seen in the jump from the PlayStation and N64 to the Xbox and its contemporaries. Halo used shaders for some neat effects and a game such as Splinter Cell was built around dynamic point lighting. Games of the previous era had both crazy low resolution textures and pre-baked lighting. The extensive dynamic lighting and material shaders (making surfaces appear to be of a certain material) are what give modern games much of their wow.

The final key component of 3d programming is linear algebra. It at times can seem like magic, but it’s just math. Practically, linear algebra is the set of rules which dictate how matrices (arrays of arrays) of numbers interact (multiplication, division, addition, subtraction, etc) with each other and scalars (individual numbers). In the context of 3d programming this is useful because we’re often dealing with large groups of numbers. I’m not sure who figured this out, but conveniently certain formulas do all the hard legwork to convert what is 2d data into 3d data. One algorithm often seen in vertex shaders is of the form:

gl_Position = projection * view * vec4(position, 1.0);

Becuase linear algebra is “backwards”, this projection * view * model matrix is also known as the mvp matrix. Projection is a matrix which contains information such as the field of view and screen width/height ratio. The view is a matrix derived from the location and direction of the camera/player, and finally, the model is the actual data which represents the object. Most languages have linear algebra libraries which do almost all of this work for you, which means that with a few lines of code you can generate 3d scenes. Linear algebra also makes it easy to pull individual items into their own 3d spaces which you manipulate them relative to, only to later combine into “world space”. Without this ability, keeping track of massive scenes of objects would be difficult. When comanche renders a 1024 x 1024 map, it’s rendering 1,000,000 cubes, and many games have scenes with far more objects. Basically, it’s good, and linear algebra is also central to machine learning, so it’s more than a little useful to know and not complicated.

Not to simplify things too much, but what I just described is the heart of graphics programming. State of the art games use more elaborate techniques, but at the end of the day you’re loading vertices and textures onto the GPU and manipulating them through shaders and linear algebra, this is as much the case for the humble cube as it is for the character model made of 1 million polygons. Complete games also include physics, AI, and sometimes networking. These can be as complex or simple as you like. Many of the systems which simulate real world phenomena are in fact simplified hacks. You may not have real world physics, but you can still simulate gravity or collision detection in a day. What’s more is there is some crossover in these different domains, and like graphics, there’s a ton of material out there describing how they work and are implemented. You can very easily have a working 3d game in 500 lines of code. Maybe not the Game of the Year, but something to build on and experiment with.

I find this a little intoxicating. It’s a good feeling to be able to finally read through Craft’s main function and understand what’s going on. What’s just as fun is being able to play games, especially older ones, and to understand how it all works. You can play a game like Minecraft and realize just how little is going on and how anyone who takes the time to learn can build worlds like that, too. It’s funny becuase I remember being interested in the topic growing up but always running across people in game forums who acted like 3d programming was impossibly difficult and who discouraged beginners from trying, steering them down different avenues. I still see this today and all I can think is that it’s really not that hard, it’s not hard at all, you just gotta put in some work and learn. The effort is very rewarding, though, if for no other reason than to prove to yourself that you are capable. I often see similar discouragment and learned helplessness across the tech industry as a whole which frustrates me. Why tell people what they can’t do? You ever noticed how those who talk the most often don’t really know what they are talking about? It’s almost like their primary motivation is to be heard and figuring out the truth is only incidental. We can talk about this another time…

Learning strategies

One of the goals of comanche, aside from making a working game and game engine, is to provide a tool for learning. Anyone should be able to read the source and understand how everything works. Things should be as simple as possible but no simpler, which stands in contrast to many projects which are overly complicated because they were never really made to be understood. It’s very powerful to understand something new, and I’d like to encourage that, besides the fact that I’m quite literally obsessive about code and can’t really help myself. No line should be wasted.

While were’re on this topic of learning, my personal experience as a political scientist by training includes sitting through multiple stats classes and being fed linear algebra and often wondering what the hell was going on. If I don’t see why I should know something, I don’t find it interesting, and if I don’t find something interesting, I don’t learn. You may feel the same way. I had similar experiences in math classes throughout school; the only class I ever failed was freshman geometry in high school, which involved copying theorems and their proofs on index cards. How unbelievably boring! On the other hand, if you had told me then that by learning this stuff I could build worlds, well, I would’ve put a lot more effort into it. My belief is that the properly designed game engine can be an incredible learning tool both for kids and adults, but especially kids. Also, the crossover between linear algebra in games, machine learning, and stats is interesting. Besides the possibilty for learning, there are untold data visualizations that would be illuminating and possible for more people if only this stuff were more approachable, and at the risk of sounding ambitious, game worlds can be a boon for social science research. Some study happens after the fact, but if it was inherent in the design, what could you discover? Some of this is a ways off, but they’re avenues I’d like to explore.

One more thing on the topic of learning. A strategy I find useful when approaching an unfamiliar topic is to 1) read articles and books to get a general overview of what’s possible 2) build a practically useful project based on that understanding and 3) repeat the first two steps using different material and languages/libraries. This is more generally applicable than tech, but by approaching the problem from different perspectives, I tend to eventually figure things out. comanche is but one of multiple attempts in other languages. Some are broken, wrong, or do a fraction of the work, but they exist. There’s a renderer in Go which includes font rendering using texture atlases, but the wrapping is wrong, there’s some lag on input that I can’t figure out, and I didn’t really understand vertex array objects at the time. There’s a “lisp-engine” which only renders a triangle, but you can change that color at runtime, which is pretty cool if you ask me (thanks lisp). cube is a C++ renderer which integrates ImGui for menus (and it actually works). There are two Rust projects, one uses the library Glium, another uses raw OpenGL. Finally, craft.cpp is Craft converted to compile using a C++ compiler, and nimcraft was an attempted port in Nim which simply doesn’t work, much like nimgl. Knock yourself out.

Why WebGL

I’ve found that all you really need when doing graphics programming are bindings to OpenGL, a matrix library, a library which can read images for textures (preferrably PNGs), and a library which works with the OS to handle input and window creation. OpenGL does not handle the latter, but GLFW and SDL2 both exist and any language worth its salt will have bindings to those and OpenGL, which are all written in C. Some languages have additional libraries such as Rust with Glutin. Matrix libraries are common, some languages have multiple. It’s not even especially difficult to write your own matrix libary (Craft does this), and this may be worth doing if you want to see how they operate, but others have done the work and the optimizations. The old standby is C++’s GLM; many libraries are modeled after it. I believe SDL2 will read images, but most languages have a library that will do this, too.

My instinct is to get as close to the metal as possible to make the smallest and most efficient engine. However, there are some real advantages to using WebGL, which is built into every modern browser and mirrors the C OpenGL API very closely. The browser handles windowing, input, and images so the only library you need is for matrix math; comanche uses glMatrix. It’s easy to get something on the screen and easier to distribute it to users via a link.

The other major advantage to the browser is Javascript. I’m being intentionally provocative here because I know the pastime of choice in the usual tech haunts is to act like Javascript is the worst thing ever made and that anyone who uses it can’t possibly be a real programmer, but my continued experience with it is that it’s pretty damn good. It’s plenty fast, high-level, concise, unopinionated (functional and object-oriented code can easily coexist in the same codebase), and there are plenty of “ok, that’s pretty great” features (like unpacking/pattern matching on objects). It’s great at prototyping. As a comparison, Craft spands 150+ lines implementing hashmaps, which are just there in JS (and most modern langauges for that matter). It’s nice not having to do that. Is it the pinnacle of design? Hell no, but it’s good enough for me.

Temporary detour, but the state of discussion around languages and tech in general where much of it is uninformed breathless hype makes me want to pull my hair out. Not specific to Javascript, but I can’t tell you how many times I’ve seen popular opinions/statements bandied around (and have for years), and then upon trying them, those statements don’t reflect reality. Over time I’ve realized that so much of what’s hot in tech is based on a small crowd hyping up what they’ve got to crowds who don’t have the experience to say otherwise or are scared to look stupid. And of course, when you are evangelizing, it’s usually difficult to say “by the way, we suck here, and here, and here”. I wonder how much misunderstanding this style of discussion is driving.

Anyway, I’m saying Javascript isn’t a bad language and for the purposes of learning it’s even great. If you really can’t change your mind about it, TypeScript fixes many of its remaining issues and BuckleScript exists, too (and is excellent). Either one of them can use the same libraries. Finally, what these languages can all take advantage of is that there’s really not a better platform for custom UIs than the browser - overlaying an interface on top of your WebGL canvas is a few lines of CSS. It’s hard to beat that convenience.

Plans

This being said, I’m not convinced that this is the best way to make a game, though I figure it’s always useful to be able to target the browser and to have have multiple codebases to work out your design/interfaces/specification. If you really had to, it should be straightforward to convert parts of the codebase (especially performance critical sections) to other languages which would compile down to WebAssembly and native libraries. Your renderer would then be a very thin shim which is the only custom part of each port, and because WebGL mirrors OpenGL, conversion should be trivial. Another idea I had was to port to a subset of TypeScript and then transpile that to something high-level but fast like lisp. Maybe another day.

Unfortunately, development is going to slow and and then come in starts and stops. I really wanted to get this to a point I was happy with, but more importantly, I needed to show something to the artists I’m working with. One of them is trained to do this (from the same school) and is already great, another is a classically trained artist who thinks of himself as a surrealist and favors Salvador Dali. He started learning Blender and similar tools around the same time I got to work on this, and in the last couple years he’s progressed to making some genuinely great stuff, even though I have to remind him that before he can make his second game he’s got to figure out how to animate a model. We’re all learning on the go and in our free time, and though we’re ambitious, I truly believe we’re going to make something special, maybe even something succesful, should we stick to it.

Over the short-term I’d like to add frustrum culling, chunking, and some basic mechanics so it’s more of an actual game, then procedural generation. Not sure where we’ll go after that. Maybe a landscape generator. As far as the long-term goes, what kind of game we want to build, I’m not making any promises, but I can tell you what my friends and I are inspired by. Early 3d games from the 90s are very cool: great examples are Quake, Crash Bandicoot, Tomb Raider, Spyro, Metal Gear Solid. They had so little hardware to work with that they couldn’t beat you over the head with spectacle like many modern games do. They often had really great mechanics and graphically they have a charm, even today. I loved the living worlds of EverQuest, Asheron’s Call, Ultima Online. Those were special experiences games today still don’t understand how to capture. The raw functionality and easy modability of games like the original Half-Life and Starseige: Tribes have been lost in the modern age (Tribes had 64-player servers, vehicles, IRC (!), and mods like Tribes Football which I’ll never forget). There was the gameplay loop of Halo, the seamless multiplayer of Halo 2, the asymmetric multiplayer of Splinter Cell: Pandora Tomororw and Brothers in Arms, and the high skill ceiling and wide appeal of Super Smash Brothers and Bloodborne. Finally, there are the open gameworlds of today and the incredible combination of systems and mechanics in Breath of the Wild.

Ok, so that’s just listing a bunch of games I like. The good thing is there are plenty of examples of what works, this stuff doesn’t have to be discovered, some of it is practically ancient, you just gotta put the pieces together. By pulling back the graphical spectacle (Tranformers isn’t a very good movie), smaller teams can compete, and given the advance of hardware, it’s hard to imagine what’s possible given a few years. Even now hardware is not the limitation. While VR may be immature, when it’s ready, the next Mario is going to be a billion dollar franchise. The goal for our team isn’t that, but rather to be in a position to have a chance of doing that. With hard work and time, that’s achievable.

More plans

It will take time. Currently, I’m working a couple jobs while trying to finish a dissertation. That’s humblebraggy, but it keeps me level and forces me to pace myself. Job 1 I’ve had for 12 years. Fresh off reading Programming Rails, I walked in and told my boss on day one that we could recreate their PHP app in a few days. Oh yeah no problem. Rails has scaffolding, right? Of course that was ambitious (try weeks if not months), but I’ve grown up there and learned not only how to build systems but my bosses have perhaps unintentionally taught me so very much about treating clients and employees right. Job 2 is newer, I’m on a small team that builds open source software that’s getting used by people at some of the best universities in the world (yes, in Javascript). I’m amazed and proud of what we’ve accomplished as a team. We too are learning as we go, but it’s nice working with people who are talented. The dissertation, well, I’ll be glad when it’s over. Gotta keep chipping away.

comanche is but one of two side projects. Honestly I’m more excited about the other project I haven’t told you about. It’s been 13 years in the making, but I think its time has come. It is one of those things that I wanted to and probably could have done, but was too immature, too afraid that it wouldn’t be good enough or that it was a good idea but someone would steal it, or worse, nobody would care. I’m old enough to where that doesn’t really matter anymore, I just want to try.

My first attempt at it was some time in 2012. I kept rewriting it, getting nowhere, but in the last year I’ve forced myself to stick with it, and I’m almost ready to show it off. I won’t spoil too much, but it’s a weird hybrid. It seems very obvious to me, but I am not really aware of anything quite like it, which makes me unsure if there’s something I’m missing. I know at the very least that it’s something I will find useful and maybe others will, too, which seems like a good basis for some kind of success, even if only personal.

These projects are experiments in extreme openness and transparency. I believe firmly in the ethos behind open source software. The world is not zero sum, by giving away your work and encouraging others to do the same you can create vibrant communities and wealth where there is none. By being realistic about your strengths, weaknesses, and goals, and getting input about them, you are not weaker but stronger. I’ve got a few ideas for monetization, but I ultimately believe that if you make genuinely useful products, you won’t be able to stop people from throwing money at you. They already do for for so many projects which don’t deliver what they said they would. What if you went the opposite direction, and gave your customers more than they expected?

I believe you can give away your best ideas and work to your competitors and it won’t matter. You can’t fake inspiration, desire, or hard work. You can say you value users and transparency, but if you don’t actually value those, they’ll be the first thing you abandon. Most importantly, you can’t fake a genuine concern about the quality of your work. Users know what’s good, and I think they’ll reward you for it. They’ll also reward you, in that human way, for seeing them not as statistics but rather as individuals. This cannot be faked, either. My bet is that you can do all of this with a fraction of the resources of the big guys because values drive priorities and priorities drive results. If this sounds naive and idealistic, I think so, too, but I want to find out.

Finally, the end. This is the last post on this blog setup. The other project I told you about will be replacing it. My first step will be to convert my old content over, the second will be to write a post about it. If you’re interested, please subscribe to my newletter. Just kidding, I don’t believe in newsletters. I do have an RSS feed if you are into that. My goal is to get something live and running by the beginning of May. Hope to see you then.