Mode13 - 110x88x256

Could be because you’re iterating 388110 times.

Looks to me like it could use a little manual loop fusion

I’ve no idea what your data format is or anything, but this should be functionally equivalent, so it’s worth seeing if it’s faster:

void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset)
{
	write_command(0x20); write_data(0); // x
	write_command(0x21); write_data(0); // y
	write_command(0x22); // pixel data mode

	for(int y=0; y <110; y++)
	{
		for(int t = 0, x = 0; x < 88;)
		{
			const uint8_t index = static_cast<uint8_t>(scrbuf[t++] + offset);
			const uint16_t wdata = paletteptr[index];
			write_data(wdata);
			write_data(wdata);
		}
                // No idea why the same block needs writing twice
		for(int t = 0, x = 0; x < 88;)
		{
			const uint8_t index = static_cast<uint8_t>(scrbuf[t++] + offset);
			const uint16_t wdata = paletteptr[index];
			write_data(wdata);
			write_data(wdata);
		}
	}
}

Sorry I rewrote the brackets too, I struggle to read other brace styles.

I’m sure there’s probably a better way of doing this, but as I said, I don’t completely understand what’s going on.

Ok so the problem here @Pharap @spinal is you’re mixing three different operations in the same loop.

  1. You are reading the data from the screen buffer
  2. You are accessing the palette with the index
  3. You are talking to the LCD

The reason why its not as fast as my loops, is because I have spilt these operations apart from each other.

  1. I read the values in the screenbuffer and then make a new uint16_t[88] array (a scanline) that contains the pixels of that scanline 565 formatted 16-bit colors
  2. when drawing to the screen, I read the pixel values directly from this buffer - at this point I am NOT reading the palette anymore

The reason why this makes a difference in performance is because when you split the operations in 2 steps, the MCU is only accessing 1 location in memory - the scanline buffer. If you look at the disassembly, you see that accessing different areas of the memory constantly (the palette and the screenbuffer) means that the processor needs to constantly assign and reassign the registers for fetching data from the memory. If, in addition, as in this example, the processor is also writing to the LCD, the processor runs out of registers - meaning the variables needed for the operation can not all fit into the internal registers. The processor then needs to save and load stuff from the RAM in every cycle of the loop. And there is the problem.

When you want to do fast stuff on a simple chip like this, you need to simplify the individual loops, so that all the needed data can all fit into the registers. That is where the speed is!

2 Likes

Basically the LCD is organized as 220 by 176 pixels. That is the only resolution it can do.
We want to do a lower resolution frame buffer so it uses less memory and the processor can draw things faster. So we create a half resolution frame buffer in the processor memory.
To output or buffer to the display, we use pixel doubling.
Do, the LCD memory looks something like this:
Pixel 0. Pixel 1 Pixel 2… … Pixel 219
Pixel 220 Pixel 221 Pixel 222 etc.
So what we need to do is write the same data to pixel 0, pixel 1, pixel 220, and pixel 221 so it looks like one big pixel.
This is done by doing two write data commands to write the two pixels across. And then rewriting the same Line twice to get the lower 2 pixels and form a big pixel.
Does this make sense?

Got it, I modified the mode2 routine. Would you beleive I tried this multiple times over the past couple of days with no luck. It turns out I have my screen buffer wronge, I was working in landscape and had forgot that the pokitto screen is really portrait.

The following routine maxes out at around 30fps. Can I expect to be able to do any better than this while updating the full screen?

void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset){
uint16_t x,y;
uint16_t scanline[88]; // read two nibbles = pixels at a time
uint8_t *d;

write_command(0x20); write_data(0);  // 0
write_command(0x21); write_data(0);
write_command(0x22);
CLR_CS_SET_CD_RD_WR;

for(x=0;x<110;x++)
  {
    d = scrbuf+x;// point to beginning of line in data
    uint8_t s=0;
    for(y=0;y<88;y++)
    {
        uint8_t t = *d; // higher nibble
        scanline[s++] = (paletteptr[(t+offset)&255]);
        d+=110; // jump to read byte directly below in screenbuffer
    }
    s=0;
    for (s=0;s<88;) {
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
    }
    for (s=0;s<88;) {
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
    }
  }

}
3 Likes

Thats 1.16 million pixels per second out of a 48mhz cortex m0 by the way.

You should be able to get more. Please share the main test file and project settings.

1 Like

Same link as the first post…

Although I do seem to be getting some noise on the screen, I wonder if the buffer is still a bit wrong…

Ummm… if you got 30fps with that test code, its not bad at all. You’re running sound at the same time and sound actually takes a big chunk of the cpu power. I bet your gfx output is way faster than you think.

I have several speedup tricks I still haven’t got around to try on Pokitto. At the moment its a “brute force” solution.

Sound doesn’t seem to work 100% of the time, if I set the volume to 32 it works usually. If I leave it alone I don’t hear anything.

I keep meaning to look at the LCD code to see if page flipping would genuinely be possible.

1 Like

Yep I kinda have a strong suspicion as to why this is (master volume conflict) and I’ll take a look at it.

I’d realised by about half way down.
I was half asleep and had a headache earlier so I wasn’t really taking it in :P.

On second glace I get the 1 byte = 4 pixels relation.

That makes sense.
I haven’t got round to looking at ARM in detail yet.

2 Likes

Weirdly, If I disable sound I only get 25fps. :-p

Your method of measuring fps is not reliable. You need to average over longer time

mode 13 now part of pokittolib repository

@spinal, would you please give a short writeup of what it is and how it is used?

Edit, also, for your hard work:

4 Likes

Sure, I’ll have to double check that I added the changes correctly first.

[edit]
Nope, I missed some stuff.

1 Like

Using mode13 isn’t too tricky, it’s just the same a a lot of the other modes.
I decided to call it Mode13 for two reasons, the First, it reminds me of the old MSDOS mode13, which was 320x200 with 256 colours. This screen mode was roughly half of the ‘high resolution’ 640x480 16 colour modes that VGA cards could do at the time. The second reason was that it coincidentally is the 13th graphics mode to be defined in the Pokitto core files.

To use mode13 simple define it in your my_settings.h

#define PROJ_MODE13  1   // mode 13 graphics 110x88x256

Then the following is an example of this mode in use.


#include "Pokitto.h"

unsigned short pal[256]; // assign a 256 entry array to hold the palette

int PntClr(int x, int y){
	return game.display.getPixel(x,y);
}
void Dot (int x, int y, int c){
	game.display.drawPixel(x,y,c);
}
int RandMinMax(int min, int max){
    return rand() % max + min;
}
int Adjust (int xa, int ya, int x, int y, int xb, int yb){
	if(PntClr(x, y) != 0) return 0;
	int q = abs(xa - xb) + abs(ya - yb);
	int v = (PntClr(xa, ya) + PntClr(xb, yb)) / 2 + (RandMinMax(0,q*10)) / 10;
	if (v < 1) v = 1;
	if (v > 255) v = 255;
	Dot(x, y, v);
	return 1;
}
void SubDivide (int x1, int y1, int x2, int y2){
	if ((x2 - x1 < 2) && (y2 - y1 < 2)) return;
	int x = (x1 + x2) / 2;
	int y = (y1 + y2) / 2;
	Adjust(x1, y1, x, y1, x2, y1);
	Adjust(x1, y2, x, y2, x2, y2);
	Adjust(x2, y1, x2, y, x2, y2);
	Adjust(x1, y1, x1, y, x1, y2);
	if(PntClr(x, y) == 0)	{
		int v = PntClr(x1, y1) + PntClr(x2, y1) + PntClr(x2, y2);
		v = v + PntClr(x1, y2) + PntClr(x1, y) + PntClr(x, y1);
		v = v + PntClr(x2, y) + PntClr(x, y2);
		v = v / 8;
		Dot(x, y, v);
	}
	SubDivide(x1, y1, x, y);
	SubDivide(x, y, x2, y2);
	SubDivide(x, y1, x2, y);
	SubDivide(x1, y, x, y2);
}
void make_plasma(int x1=0,int y1=0,int x2=game.display.width-1,int y2=game.display.height-1){
	game.display.clear();
	if(x1<0)x1=0;
	if(y1<0)y1=0;
	if(x2>game.display.width-1)x2=game.display.width-1;
	if(y2>game.display.height-1)y2=game.display.height-1;

	Dot(x1, y1, RandMinMax(0,255) + 1);
	Dot(x2, y1, RandMinMax(0,255) + 1);
	Dot(x2, y2, RandMinMax(0,255) + 1);
	Dot(x1, y2, RandMinMax(0,255) + 1);
	SubDivide(x1, y1, x2, y2);
}
void make_pal(void){
	int a,s,r,g,b;
	for(a=0; a<=63; a++){
		s = 0; 	r = a; 		g = 63-a;	b = 0;		pal[a+s] = game.display.RGBto565(r*4,g*4,b*4);
		s = 64; r = 63-a;	g = 0;		b = a; 		pal[a+s] = game.display.RGBto565(r*4,g*4,b*4);
		s = 128; r = 0;	 	g = 0;		b = 63-a;	pal[a+s] = game.display.RGBto565(r*4,g*4,b*4);
		s = 192; r = 0;		g = a;		b = 0;	 	pal[a+s] = game.display.RGBto565(r*4,g*4,b*4);
	}
    game.display.load565Palette(&pal[0]); // load a palette the same way as any other palette in any other screen mode
}


int main(){
    game.begin();
    game.display.persistence=1;

	srand(game.getTime());
	make_pal(); // create 256 colour palette for this demo
    make_plasma(0,0,game.display.width-1,game.display.height-1); // create a nice plasma cloud

    while (game.isRunning()) {
        if(game.update()){
        // mode 13 has a palette offset, so instead of rotating your palette, you just tell it which colour should be first.
            game.display.palOffset = col++;
        }
    }

    return 1;
}

Currently mode 13 is not fully integrated (I accidentally left a couple of things out when I submitted it) but you can download a pokittolib containing mode13 from here - https://github.com/spinalcode/PokittoLib

2 Likes

##added to Pokitto mbed team page

I have added this demo to mbed online team page … but the palette rotation is not working?

https://os.mbed.com/teams/Pokitto-Community-Team/code/Mode13/

Looking at the new update code, I can’t even tell how mode13 is being updated at all.
The version of PokittoDisplay.cpp that I have offline has…

void Display::update() {
#if POK_SCREENMODE == MODE13
    lcdRefreshMode13(m_scrbuf, paletteptr, palOffset);
#endif

Hmm… is this @ github/spinalcode/pokittolib?

nope, but it is in https://github.com/spinalcode/mode13

In that case I didn’t run it through the compare tool (WinMerge) and missed it