Mode13 - 110x88x256

I was testing direct pixel routines, it seems I might be running too fast. Is there no way to wait for vblank/hblank? According to the screen documents there is at least an internal signal being used but didn’t see anything about that pin being connected externally.

No way. Hblank/Vblank is only relevant to RGB mode on this controller.

Use a __nop(); before switching back to command mode (look at writeData)

The best I’m able to get is 13-15fps. Surely I should be able to do better?

Isn’t Pokitto currently limited to 15 fps in all modes?

I don’t think so, not in mode13 anyway, I’ve set it to 60.
Although setting the frameRate to 1 or 2 is making no difference to the running speed. That’s what I get for messing with the update routines.

  1. Take a look at how mode2 refresh works. -
    A. your code can be made 2x more efficient with a simple tweak: you are reading the same source pixel values twice in two separate for loops. Look at how I first create a scanline buffer and then blast it out twice
    B. contrary to popular belief, for loops are not efficient. You have to “unroll” your loops. Write_data is a tiny bit of code. Instead of for looping through all pixels, you should write 10x pixels (at least) inside one loop and then loop 10x less. The difference is big.

  2. Fps. Currently it is limited @20fps just because it was easier to get the simulator to somewhat match the speed of hardware that way. This limitation will be removed. Secondly, the timing mechanism is not accurate enoug to get a good fps reading. You have to average over a long time. This will also be fixed.

2 Likes

This is what I’m currently using and it’s still only getting 11-12fps…

void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset){

    uint16_t wdata;
    write_command(0x20); write_data(0); // x
    write_command(0x21); write_data(0); // y
    write_command(0x22); // pixel data mode

    uint8_t scanline[88];

    int t=0;
    for(int y=0; y <110; y++){

        for(int x=0; x < 88; x++){
            scanline[x]=(scrbuf[t++]+offset)&255;
        }

        for(int x=0; x < 88;){
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
        }
        for(int x=0; x < 88;){
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
            wdata = paletteptr[scanline[x++]]; write_data(wdata); write_data(wdata);
        }
    }

}

Could be because you’re iterating 388110 times.

Looks to me like it could use a little manual loop fusion

I’ve no idea what your data format is or anything, but this should be functionally equivalent, so it’s worth seeing if it’s faster:

void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset)
{
	write_command(0x20); write_data(0); // x
	write_command(0x21); write_data(0); // y
	write_command(0x22); // pixel data mode

	for(int y=0; y <110; y++)
	{
		for(int t = 0, x = 0; x < 88;)
		{
			const uint8_t index = static_cast<uint8_t>(scrbuf[t++] + offset);
			const uint16_t wdata = paletteptr[index];
			write_data(wdata);
			write_data(wdata);
		}
                // No idea why the same block needs writing twice
		for(int t = 0, x = 0; x < 88;)
		{
			const uint8_t index = static_cast<uint8_t>(scrbuf[t++] + offset);
			const uint16_t wdata = paletteptr[index];
			write_data(wdata);
			write_data(wdata);
		}
	}
}

Sorry I rewrote the brackets too, I struggle to read other brace styles.

I’m sure there’s probably a better way of doing this, but as I said, I don’t completely understand what’s going on.

Ok so the problem here @Pharap @spinal is you’re mixing three different operations in the same loop.

  1. You are reading the data from the screen buffer
  2. You are accessing the palette with the index
  3. You are talking to the LCD

The reason why its not as fast as my loops, is because I have spilt these operations apart from each other.

  1. I read the values in the screenbuffer and then make a new uint16_t[88] array (a scanline) that contains the pixels of that scanline 565 formatted 16-bit colors
  2. when drawing to the screen, I read the pixel values directly from this buffer - at this point I am NOT reading the palette anymore

The reason why this makes a difference in performance is because when you split the operations in 2 steps, the MCU is only accessing 1 location in memory - the scanline buffer. If you look at the disassembly, you see that accessing different areas of the memory constantly (the palette and the screenbuffer) means that the processor needs to constantly assign and reassign the registers for fetching data from the memory. If, in addition, as in this example, the processor is also writing to the LCD, the processor runs out of registers - meaning the variables needed for the operation can not all fit into the internal registers. The processor then needs to save and load stuff from the RAM in every cycle of the loop. And there is the problem.

When you want to do fast stuff on a simple chip like this, you need to simplify the individual loops, so that all the needed data can all fit into the registers. That is where the speed is!

2 Likes

Basically the LCD is organized as 220 by 176 pixels. That is the only resolution it can do.
We want to do a lower resolution frame buffer so it uses less memory and the processor can draw things faster. So we create a half resolution frame buffer in the processor memory.
To output or buffer to the display, we use pixel doubling.
Do, the LCD memory looks something like this:
Pixel 0. Pixel 1 Pixel 2… … Pixel 219
Pixel 220 Pixel 221 Pixel 222 etc.
So what we need to do is write the same data to pixel 0, pixel 1, pixel 220, and pixel 221 so it looks like one big pixel.
This is done by doing two write data commands to write the two pixels across. And then rewriting the same Line twice to get the lower 2 pixels and form a big pixel.
Does this make sense?

Got it, I modified the mode2 routine. Would you beleive I tried this multiple times over the past couple of days with no luck. It turns out I have my screen buffer wronge, I was working in landscape and had forgot that the pokitto screen is really portrait.

The following routine maxes out at around 30fps. Can I expect to be able to do any better than this while updating the full screen?

void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset){
uint16_t x,y;
uint16_t scanline[88]; // read two nibbles = pixels at a time
uint8_t *d;

write_command(0x20); write_data(0);  // 0
write_command(0x21); write_data(0);
write_command(0x22);
CLR_CS_SET_CD_RD_WR;

for(x=0;x<110;x++)
  {
    d = scrbuf+x;// point to beginning of line in data
    uint8_t s=0;
    for(y=0;y<88;y++)
    {
        uint8_t t = *d; // higher nibble
        scanline[s++] = (paletteptr[(t+offset)&255]);
        d+=110; // jump to read byte directly below in screenbuffer
    }
    s=0;
    for (s=0;s<88;) {
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
    }
    for (s=0;s<88;) {
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
        setup_data_16(scanline[s++]);CLR_WR;SET_WR;CLR_WR;SET_WR;
    }
  }

}
3 Likes

Thats 1.16 million pixels per second out of a 48mhz cortex m0 by the way.

You should be able to get more. Please share the main test file and project settings.

1 Like

Same link as the first post…

Although I do seem to be getting some noise on the screen, I wonder if the buffer is still a bit wrong…

Ummm… if you got 30fps with that test code, its not bad at all. You’re running sound at the same time and sound actually takes a big chunk of the cpu power. I bet your gfx output is way faster than you think.

I have several speedup tricks I still haven’t got around to try on Pokitto. At the moment its a “brute force” solution.

Sound doesn’t seem to work 100% of the time, if I set the volume to 32 it works usually. If I leave it alone I don’t hear anything.

I keep meaning to look at the LCD code to see if page flipping would genuinely be possible.

1 Like

Yep I kinda have a strong suspicion as to why this is (master volume conflict) and I’ll take a look at it.

I’d realised by about half way down.
I was half asleep and had a headache earlier so I wasn’t really taking it in :P.

On second glace I get the 1 byte = 4 pixels relation.

That makes sense.
I haven’t got round to looking at ARM in detail yet.

2 Likes

Weirdly, If I disable sound I only get 25fps. :-p

Your method of measuring fps is not reliable. You need to average over longer time

mode 13 now part of pokittolib repository

@spinal, would you please give a short writeup of what it is and how it is used?

Edit, also, for your hard work:

4 Likes