[WIP] Galaga


My favourite game, other than 1943!

Watch the video below to see how I am progressing.

As you can see its somewhat playable but still has a long way to go.

Performance is the biggest issue.


I have built a framework for defining sequences, formations and levels in data rather than code. Once I have ironed out the game play, I will be designing levels for the next few months.


Oh great, one of my favs also! :heart_eyes:


If you are using floats it could be worth trying to use fixed point numbers instead.


Also, since the microcontroller is updating the screen, running at max possible fps means a very big chunk of cpu time is spent there


I have thought of this also. I was considering swapping to @Pharap’s fixed point library - which I have used previously - but are not sure its the biggest culprit.

Right … its that trade off between trying to make the graphics appear smooth and the time taken to render it all.


The way I did it in Pysconian (or at least got it much better) is that I split up the collision calculations between objects. Pysconian has close to 100 game objects updated concurrently … in Python.

I update the positions of the objects on every frame, but the expensive collision calculations I do once every 2-4 frames depending on the object. This helped the frame rates alot.

A projectile hit box, for example, is so big, that the possibility of it passing through an enemy, even if updated only 1 out of 4 frames, is low.

Do you get what I am saying?


That’s an interesting idea. How where you calculating collisions? I am using some code I borrowed from the Arduboy library that compares the enclosing rectangles for overlaps. Quite lightweight but expensive when there are 50 enemies and 5 or so bullets on the screen at any given time.


I am using the “industry standard” AA-BB rectangle collision check

Ax + Awidth < Bx return False // A outside of B bounding box
Ax > Bx + Bwidth return False // same here
Ay + Aheight < By return False
Ay > By + Bheight False
return true //objects must collide

Edit: but the point is that calculating this for all 20 projectiles x 6 enemies x 6 bases x 6 pods x 20 mines x 20 asteroids was killing the framerate


LOL then mine isn’t actually lifted from Arduboy. I too are using the ‘industry standard’. Mine just happens to look exactly like the code in the Arduboy library.

OK … what I have done is to grab the frame count mod 2 and used it to split the collision detection evenly over consecutive frames.

uint8_t enemiesToCheck = Utils::getFrameCount(2);

for (uint8_t x = enemiesToCheck; x < 50; x = x + enemiesToCheck) {


A dumb question … is using an uint8_t counterproductive on this processor? I am so used to doing it now - due to the Arduboy experience - that I am always scoping down to the smallest variable size possible.


%2 is the same thing as & 1 my friend.
%4 is the same as &3

and so on. With &1 , &&3 and any mod n where n is power of two, you can be sure to get the most efficient modulo operation there is (logical AND … 1 cycle instruction)


This is not a dumb question at all. Completely the opposite.

Using uint32_t is always the fastest on an ARM Cortex. If you have enough RAM, you should avoid using uint8_t

EDIT: the reason why PokittoLib is riddled with uint8_t is that it too began its life on an AtMega


Yes it is … and I would hope the compiler knows that too :slight_smile: but probably not.

Yes I thought as much. I might look at the code overall and swap out everything - I am pretty sure I haven’t counted on the overflow of an 8 bit value anywhere.


Did you see any improvement when reducing collision calculations ?


If it turns out this is the case, I can tell you how to modify FixedPointsArduino to compile for Pokitto.

If you look at my Pokitto version of Physix there’s already an example.
If I remember rightly I think it’s mostly just a case of changing some #includes around.
The bulk of the code is actually platform independent if you ignore the lack of std::/the use of C headers.
(I had intended to make a platform independent version at some point.)

Just to throw an alternative out there: binary space partitioning can be useful, albeit nowhere near as simple.

If multiplication is reasonably fast then sometimes circles can be cheaper.

If I remember rightly there is a slight extra cost because the processor has to clear the upper 24 bits regularly, but I could be misremembering.

@FManga would probably be better at answering that question than anyone else.

While I think of it, I’d like to point out that the Pokitto actually uses Thumb2 instructions, not fully fledged ARM instructions, so if you’re thinking about looking into the Architecture to decide which optimisations are most suitable, bear that in mind.

As @filmote says, the compiler really ought to know this.
If it doesn’t, I’d file a bug report and/or demand an explanation.


I might look into that.


Yeah, all calculations are done in 32-bit registers and need to be clamped.
This also goes for arguments passed to functions (a char will be stored in 4 bytes by the caller).
Clamping is a cheap operation though (1-cycle), so generally you shouldn’t expect to see a significant speedup by changing from one type to another.

Note that, while calculations and transfers favor 32-bit, there is no penalty for storing 8/16-bit datatypes. An array of uint8_t or uint16_t can be read/written at the same speed as an array of uint32_t.

It’s good you pointed out that Pokitto uses Thumb2. This is important as that has a big effect on optimization. ARM processors have 15 registers, but most of the Thumb2 instructions can only access the lower 8 (R0 - R7). This means that, eg. if you use too many variables inside a for loop, the compiler quickly runs out of registers and they have to get flushed to RAM and reloaded constantly.

As for x % 2 vs x & 1… the compiler will probably turn either one of these into x << 31.

While ANDS is a one-cycle operation, in Thumb2 it requires two registers. First you need to load the immediate into a temporary register (and possibly flush something else out to RAM), then do the ANDS (and reload whatever got flushed out).
Instead, it’s cheaper to just shift left (LSLS Rx, 31) to discard the unnecessary upper bits. You then test if the result is zero as normal.

The fact that ANDS costs a register is good to know when optimizing tight loops where masking is being done (things like updating a framebuffer or drawBitmapData).


Wow! This game plays really well. I got about 70k point at first try, and I usually suck in this kind of games. The playability is one of the best in Pokitto :slight_smile: Graphics look good and there is a lot of variation. A love the sound of the laser, it is spot on! Overall the game sounds really good too. I really pays off if you struggled with the audio :wink:

The first level could be a bit easier not to turn away less hard core players. Also the speed of the objects overall could be a bit slower, but I have not played much the original.

Good work!


Thanks @hanski.

I am not sure what version you played but I have made some changes to the game to make it a little easier at the start and to get progressively harder. For example, the enemies shoot less in early stages and more frequently in later stages. There are three types of enemy bullets that do varying degrees of damage and the bullets that do the most damage are only used in later levels.

The game play is actually a little easier than the original as I have included a health bar to compensate for the lack of vertical screen space. In the original, if you were hit by a bullet its game over! In mine, you get a bit of a reprieve!