Hello Pokitto guys and gals, Dennis here.
I would like to share my experiences with you.
I use Segger Embedded Studio for writing assembly programs with a custom 11u68 statrup code.
For debugging I use a J-link probe.
I have written a very simple fixed-point Mandelbrot generator
and tested it with different clock settings from 24Mhz to 72Mhz.
Test conditions:
40 Mandelbrot sets were generated while zooming in and measured run time.
220x176, 4BPP mode was used with EGA palette settings.
d-mandel.bin is included to this post. It does not contain the loader.
d-mandel.bin (2.2 KB)
Results were: (SYSPLLCTRL values for PLL settings)
24Mhz: 18.09s (MSEL=1, PSEL=2)
36Mhz: 12.07s (MSEL=2, PSEL=2)
48Mhz: 9.12s // default Pokitto speed (MSEL=3, PSEL=1)
The fastest screen buffer to lcd copy is 6 clocks per native pixel.
That is 24 clocks per pixel in 110x88-4BPP mode.
Copying only runs at ~200FPS.
I use the GPIO register NOT (GPIO_BASE | 0x2300) to toggle pin states.
(This saves a few registers and instructions in a loop. The difference between SET2 0x2208 and CLR2 0x2288 is #0x80 bytes. This needs two instructions to calculate, OR an extra register OR an STM - STR instruction pair.)
// r0=8 pieces of 4BPP pixels
// r6=*palette[] // stores combined values for 2 pixels
// r4=*gpio_base|MPIN2
// r1=NOT2-MPIN2
// r3=buffer on stack to store pixel data
lsrs r2,r0,#24
lsls r0,#8
lsls r2,#2
ldr r3,[r6,r2] // load Combined Palette element for 2 pixels
str r3,[r4] // MPIN 1st pixel
str r5,[r4,r1] // SET WR
str r5,[r4,r1] // CLR WR
stm r7!,{r3}
movs r2,#16 // ***
str r5,[r4,r1] // SET WR
str r5,[r4,r1] // CLR WR
rors r3,r2
str r3,[r4] // MPIN 2nd pixel ***
str r5,[r4,r1] // SET WR
str r5,[r4,r1] // CLR WR
nop // ***
str r5,[r4,r1] // SET WR
str r5,[r4,r1] // CLR WR
This is the fastest method I now. You need to insert
an instruction between the SET-CLR WR pairs or
else it will write black pixels.
I marked these “hot spots” in the code with three stars ***
No extra instrucion needed between changing WR or RD states.
STM really speeds up things. A single STR takes 2(3) clocks to execute
while an STM is 1+1n clocks.
STR r0,[r2] // 2 clocks
STR r1,[r2,#4] // 2 clocks
STR r0,[r2] // 2 clocks
STR r1,[r0] // 2 clocks + 1 clock penalty for using r0
STM r2!, {r0-r3} // 5 clocks total for loading 4 registers.
Data alignment on word boundary is important!
The loaded registers can be used after 3 instructions counted from the STM instruction.
Drawback is that STM/LDM can not store/load high registers (r8-)
Portrait mode available with the following settings:
LCD Command=3, Data=0x28 (AM=1, ID=0b10)
LCD Command=0x20, Data=0xaf
LCD Command=0x21, Data=0
This mode is good for 8BPP and 16BPP modes only.
Reading data / command from the LCD controller is not stable.
Needs manual adjusting and special timings.