Mode7?

The "ARM Cortex-M0+ Technical Reference Manual4"says multiplication takes 1 or 32 cycles (!), depending on implementation. Almost all other instructions take 1 or 2 cycles only.

Yes, LUTs are very useful. Thought, 15 k is quite much.

1 Like

May as well round it up and call it 16KB for convinience.

We might be in luck, according to the datasheet for LPC11U6x (in section 2):

ARM Cortex-M0+ processor (version r0p1), running at frequencies of up to 50 MHz
with single-cycle multiplier and fast single-cycle I/O port.

So it sounds like the Pokitto doesn’t have the 32 cycle problem.
The reasoning for the 32 cycle version is to allow interrupts to interrupt during a multiply, so presumably that’s either for slower chips or realtime systems where interrupts are crucial.


(It probably doesn’t have any bearing on anything, but 16-bit fixed points would need a full 32-bit multiply for the sake of accuracy, otherwise they loose the upper 8 bits of precision.)


Thats on higher resolution, lower speed chip

3 Likes

The board used in this demo is a clone of the Arduino 2560, but all code was written in assembly. The board is unmodified and running at 16 MHz.

We can’t quite manage that, but hopefully the speed difference will give us an advantage there.

Output is interlaced for an effective resolution 100x120 with 256 colors.

That would indicate that 110x88 is certainly realistic for the Pokitto.
220x176 might be, I think it’s still a tough call.

256 element lookup tables are used for cosine and division in order to speed up calculations.

If brads are used instead of degrees or radians, it’s possible to do better than that:

1 Like

lower resolution seems fine and idk if running in 16color would be faster or not

Single cycle multiplication in Cortex M0+ is fantastic! I did not expect that. There is no need for LUTs for multiplications, but are needed for divisions and trigonometric functions.

I remember @jonne mentioned he has tested horizontal screen device writes with an early proto and it had problems.

1 Like

There were issues where the dram writes were not starting from the same scan line.

But: I have since learned there are slight differences in the timing of the individual lcds. Believe it or not, some chip-on-glass ic’s are faster to respond than others. I had to add 2 nops in the CLR_CS macro to counter this as some lcds lost track.

Horizontal addressing works as @adekto has demonstrated. Maybe we will switch to that.

2 Likes

I’m not sure how much division would be needed, but division can usually be replaced with multiplication by reciprocal if need be.

1 Like

This is turning into quite an in depth discussion! I am however following very little of it.
Please keep up the good work!

I was thinking though, would the little extra math involved in rendering a tile map be hazardous to this idea?

Even using horizontal dram writes we have the problem of flickering in this. As we are first writing the whole mode7 screen to dram and then write some objects above it (like cars in Fzero) it will flicker. We could use a scanline combiner, like in my sprite implementation for Hi-Res mode, and draw each scanline only once per frame.

Or we could just use buffered 110x88 mode and draw to the buffer instead. That could still be fast enough (?). That gives also more flexibility as the drawing order can be selected.

1 Like