Pokitto is on the way, what games should I make?


Yeah, for just comparing you don’t have to compute square root because it’s a monotonic function. I need the exact value though, so I do need square root.

Maybe you could try different metric too. The one you’re using takes two multiplications and one addition, while Chebyshev or taxicab only take one non-multiplication operation. Probably won’t be the bottleneck, but you could try it out.


What about doing a fast inverse square root and then inverting it to get the square root?


I’ve heard about this – would be doable, but it’s still costly: two multiplications, addition, then fast 1/sqrt and then inverse.

Currently I have somewhat fast algorithm for direct integer square root I found somewhere on the Internet.

I’m thinking about hexagonal distance now – that should take two conditions, two multiplications and an addition I guess.


That’s the cheap part of the operation. Converting to/from float would be much slower than all of that.


I supposed it could be somehow applied on integers too (to compute N/sqrt(x) or something), but Wiki says it’s an FP algorithm, so maybe it’s so… then of course it’s out of question.


Flash space permitting, a lookup table is the fastest way to get a square root. It’s generally considered bad to do this on other processors (x86) for reasons that don’t apply in this case (there is no cache to thrash, really cheap reads).


There is no cache? So storing bitmaps by columns is just by convention?


Compatibility/porting? :man_shrugging:


I’ve been trying to derive the distance approximation but found it here:


At least I got to exercise trigonomoetry for a while :nerd_face:



So here is what it looks like (normal vs approximation):


It adds an FPS or two and there are cases where you can’t tell (this one is where it shows a lot). So now there is the option to choose what’s best for your game.


Trying to do these extra things and porting to other platforms makes me find more bugs and improve stuff. I thought this was kind of completed until I tried to run it on Arduboy – I found a big mistake in the code (it still doesn’t run on Arduboy without problems, but at least better than before).

And now I realize I don’t have an easy support for wolf-style doors that are in middle of squares and can roll to the sides when opened. I’ll be thinking about how to add them.



Offseting the door into the middle of the square wouldn’t be so easy without reworking a few things which would break other things, so I think something like this will have to be good enough for now :slight_smile:


Progress report:

improved door:


and ported to Arduboy and terminal:



which revealed a few bugs that are now fixed.

I’d like to take a look at floor texturing now. Probably with similar approach as @Hanski with PZero.

EDIT: Also an idea: the camera could be rotated 90 degrees along the viewing axis – i.e. “tilted” to the right for example, which would make the walls floors and the floors walls, which could be exploited for some interesting looking levels, e.g. something like a first person platformer. Then also there could be an additional camera – oriented in a normal way – for rendering walls. The views from both cameras would be combined into the final view.


Hmm can’t believe how fast you get things done… Mind blowing to me.


Once I have the common library code, it’s not that hard to run it on another platform :slight_smile:

I just managed to increase FPS (by about 7 in demo2!) with optimized texture coordinate computation. Let me elaborate so that people can use this trick too:

I need to compute vertical texture coordinate for each pixel of a wall column. In these cases it’s good to precompute the coordinate step and then just keep accumulating it like this:

int32_t coordStep = textureHeight / wallHeight; // both in pixels
texCoords.y = 0;

for (y = 0; y < wallHeight; ++y) // for each wall column pixel
  texCoords.y += coordStep;

But there is a big disadvantage: the bigger the wall drawn on screen, the less accurate this will be. Imagine the wall height on screen being between half to the full height of the texture. You’ll get coordStep = 1 for all these sizes, which is highly inaccurate. If the wall is higher than the texture, which happens a lot, you’ll get 0 step and effectively no texture coordinates.

For this reason I went to computing texture coordinates something like this:

for (y = 0; y < wallHeight; ++y) // for each wall column pixel
  texCoords.y = (y * textureHeight) / wallHeight;

Which is nicely accurate, but more expensive, as you perform multiplication and division in each loop cycle.

However today I realized this more expensive approach is only necessary in cases where the first approach fails and that I can keep the first one to be used in cases where it’s accurate enough, which can be easily decided by comparing the wall and texture height.

So now I have both these loops and choose the one to use depending on the coordinate step: if it’s too small, the wall is close to the texture size and I execute the more accurate loop, otherwise I choose the faster loop. Like this:

int32_t coordStep = textureHeight / wallHeight; // both in pixels

if (coordStep < LIMIT)
  // more accurate
  for (y = 0; y < wallHeight; ++y) // for each wall column pixel
    texCoords.y = (y * textureHeight) / wallHeight;
  // faster
  texCoords.y = 0;
  for (y = 0; y < wallHeight; ++y) // for each wall column pixel
    texCoords.y += coordStep;

The if could be inside the loop, but for the sake of performance I chose to have two versions of this time critical loop and branch early.


Feel free to share working pokitto bin of your demos if you want i am always curious to see it working on the actual hardware.

I am wondering if something that could load maps from doom might even be possible with this…


I’ll upload bins when I get to PC.

The Doom maps have arbitrarily placed walls, so they couldn’t be converted exactly, but something close could be made. Resources from Freedoom could be used.


Hmm just throwing random thoughts here … In theory a random maze of walls could be generated I guess. That could turn the whole thing into a big Minotaur’s maze where you have to get out of the maze while battling randomly spawning monsters and trying to outrun the Minotaur itself…

A couple doors could block the path and need a special relic to open them…

Could have a nice replayability there and could calculate score on the time you took to succeed and how many monsters you were able to slay.


Nice idea, that’s exactly what I have in mind when thinking about what people could use the engine for. Proceduaral generation or at least compression is very welcome, because the levels being two-dimensional tend to grow in size quickly.

Here are current bins:

demo1.bin (68.9 KB)

demo2.bin (64.8 KB)

demo3.bin (66.0 KB)


Wow guys, throw away these bins, I just sky rocketed FPS.

Calling a critical function directly as opposed to via pointer really makes a huge difference :smiley: Demo1 that was 15 - 30 FPS is now 40+. Demo2 is on 68 FPS.

Though I do this via an ugly macro trick that some people would probably hate :smiley:


These are the kind of things that make programming a device like this so much fun. You think it’s optimized and then you get an idea.


Calling a function directly allows it to be inlined in some circumstances.
When a function is inlined, the compiler can elide the steps where it saves and restores registers before and after the function call.
In some cases it can also reorder some of the function body based on information from the calling context.

(This behaviour was actually responsible for some of the bugs in Arduboy2’s Sprites::drawPlusMask at one point - when the function was inlined some of the registers would get messed up because of the long, complex block of assembly code.)

I haven’t even seen it and I already hate it. :P


This was my ignorance really, @FManga told me about this quite some time ago and I only now got to implementing it :expressionless:

Don’t look at the code, it’s becoming really macro heavy :smiley:

I don’t have much choice as I’m using C, so there are no templates, and I need to move a lot of computation to compile time. Macros allow nice performance hacks. I at least used to use inline functions but some compilers gave me weird linker problems with these, don’t know why, so I just switched to macros everywhere for now. I’m really sorry.

Someone (possibly you) can later wrap this into a nice C++ class or something.