Scroller in 3D

Turning the scroller code to 3D at about 39 fps.


Cool stuff!!!

Floats or fixed points?

1 Like

Now just add that rotation code and you’ve got mode7 :slight_smile:

Exactly :slight_smile:

Direct drawing would be faster, but wanted to use buffered mode. So I can mix it with the normal bitmaps and drawing functions.

If it can be kept at at or above 25fps I think it will be fine for games.


I am using fixed points in /PokittoLib/Pokitto/POKITTO_LIBS/FixMath/ -package


so f-zero clone or floors for raycasted first person shooter?
would love to have this for rpg overworld :slight_smile: would not even need rotation for that one

1 Like

Damn, I was hoping someone was using my FixedPoints library.

I’m guessing you need some of the trigonometry or square root functions though.

I could use it if it contained trigonometric functions which I need after I try rotations. Now I am still optimizing the current demo.

Is your implementation optimized for speed?

I’ve never tested the speed, it’s more focused genericity and ease of use.

It should be reasonably fast though, most operations are constexpr so if they can be done at compile time they will be, for example, conversion of a floating point literal (e.g. SQ15x16 a = 0.25;)

Given that FixMath is oriented towards a specific size I would think it might be faster,
but looking at the code there seems to be a lot more to each operation than I have in FixedPoints,
so I’d have no idea without testing.

How it works? Is it handy?

Just needs a serial link cable hat and you’re half way to a mario kart game!

1 Like

FixMath is written in C so it’s not very pretty/easy to use (e.g. you have to do fix16 c = fix16_add(a, b) instead of being able to do fix16 c = a + b).
Compared to most C libraries though, it’s quite well commented and fairly neatly written.
The code can be seen here.

1 Like

With the same kind of inner loop optimization as in the 2D version, I have now 50 fps :slight_smile:


Next I will try to implement rotation.


Rotation finally works with the perspective scaling, but it is still too slow.

However, I just realized that even if the integer addition and multiplication are equally fast in the LPC11U68 chip (1 cycle), the multiplication of fixed point numbers is significantly slower. For that I have to multiply two 32 bit numbers, store the 64-bit result and scale (shift) it back to, correctly rounded, 32-bit number (even more cycles with the “overflow detection”). That is why I have to get rid of fixed point multiplications in the inner loop as much as possible.

1 Like

Are you sure GCC is actually emitting a MUL? I’m under the impression that it doesn’t know that MULs are just one cycle and avoids them.
Also, try removing the lower word using a cast or a union+struct, instead of a shift:

union {
uint64_t w;
 uint32_t l, h;
} u;
u.w = ...
return u.h;

I’ve seen GCC being silly when shifts are larger than the machine’s word size.

1 Like

I am using this (fix16_mul()):

I does also overflow checks and rounding.

I have not looked at assembly at this point. I think I can just use adds in the inner loop (scanline). I might need to check the assembly later when making upper loop optimizations. Thanks for the tip!

The overflow checks themselves don’t seem too bad, but branching is almost guaranteed to cause some slowdown so I’d recommend removing that.

The library allows you to define FIXMATH_NO_OVERFLOW and FIXMATH_NO_ROUNDING to disable some of the unnecessary stuff, but it misses the bits in the adding and subtraction that could be scrapped.

1 Like