Improving FPS

Looks like the real original speed is 47 (after I set the FPS limit to 250).

Edit: That 100 is fishy. I suspect it might be higher.

Anyway, this is what I currently haveā€¦

https://pastebin.com/KsrVV2zN

Would unrolling the loops like that cause issues elsewhere?

:laughing:
On a processor with an instruction cache, thatā€™d be bad. In this case, I guess the only problem would be the amount of flash space it takes up.

I suggest disabling the framerate limiter entirely for these tests:

bool Core::update(bool useDirectMode, uint8_t updRectX, uint8_t updRectY, uint8_t updRectW, uint8_t updRectH) {

    #if POK_STREAMING_MUSIC
        sound.updateStream();
    #endif

    uint32_t now = getTime();
    if ( /**/ true /*/ (((nextFrameMillis - now)) > timePerFrame) && frameEndMicros /* */ ) { //if time to render a new frame is reached and the frame end has ran once
		nextFrameMillis = now + timePerFrame;
		frameCount++;

		frameEndMicros = 0;
		backlight.update();
		buttons.update();
		battery.update();

        // FPS counter
		#if defined(PROJ_USE_FPS_COUNTER) ||  defined(PROJ_SHOW_FPS_COUNTER)
        const uint32_t fpsInterval_ms = 1000*3;

        fps_frameCount++;
        if (now > fps_refreshtime) {
            fps_counter = (1000*fps_frameCount) / (now - fps_refreshtime + fpsInterval_ms);
            fps_refreshtime = now + fpsInterval_ms;
            fps_frameCount = 0;
            fps_counter_updated = true;
        }
        #endif

	//	return true;

	// } else {
		if (!frameEndMicros) { //runs once at the end of the frame
			#if POK_ENABLE_SOUND > 0
			sound.updateTrack();
			sound.updatePattern();
			sound.updateNote();
			#endif
			updatePopup();
			displayBattery();

            display.update(useDirectMode, updRectX, updRectY, updRectW, updRectH); //send the buffer to the screen

            frameEndMicros = 1; //jonne

		}
		//	return false;
	}
    return true;
}

1 Like

Awesome @spinal !

Iā€™m laughing out loud here, I knew this is what it would come to after some tweaking. :wink:

without the limit, itā€™s 105fps.

1 Like

I am a bit over 60 fps now.

I wanted to check that the FPS counter works correctly, so I recorded Pokittosā€™s screen with my phone video camera, at 60 fps. After checking the video frame by frame, I can confirm that the FPS counter looks to be quite correct :slight_smile: Of course, there are some dropped frames and double frames in seen in the video. That is because the frame rate is not steady as it depends on the current screen content in my demo.

I think we can soon do some PRs for these performance improvements.

1 Like

@Hanski @spinal : please send me your binaries to test asap!

mode13.bin (45.3 KB)
platformer.bin (68.8 KB)

Iā€™m getting 105fps from mode13.bin and about 43fps from platformer.bin.

2 Likes

Here is the two layer scroller at 61-63 fps. For this version I have added palette animation.

twolayer.bin (37.8 KB)

1 Like

nice :slight_smile: is that 16colour?

yes, I am using a normal ā€œfastā€ mode: 110x88x16 colors

Itā€™s a cool demo. A bit heavy on the eyes after a while :wink:

1 Like

@FManga I just cannot get this to work in setup_data_16():
*reinterpret_cast<uint32_t *>(0xA0002188) = data<<3;

It messes up the display.

this works ok:
LPC_GPIO_PORT->MPIN[2] = data<<3;

Hmmā€¦ is this with -O3? Have you tried with volatile?
*reinterpret_cast<volatile uint32_t *>(0xA0002188) = data<<3;

110 fps in the plasma example:


#define TGL_WR								 \
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;
  

 void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset){
   uint16_t x,y;
   uint32_t scanline[110]; // read two nibbles = pixels at a time
   uint8_t *d;
   uint32_t *s;

   /* * /
   write_command(0x37); write_data(0);
   write_command(0x36); write_data(176);
   write_command(0x39); write_data(0);
   write_command(0x38); write_data(220);
   /* */
   write_command_16(0x03); write_data_16(0x1038);
   write_command(0x20); write_data(0);
   write_command(0x21); write_data(0);
   write_command(0x22);
   CLR_CS_SET_CD_RD_WR;
   SET_MASK_P2;

   volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

   d = scrbuf;// point to beginning of line in data
   for(y=0;y<88;y++){

     s = scanline;

     for(x=0;x<110;x+=10){
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
       *LCD = *s++ = paletteptr[(*d++ + offset)&255]<<3; TGL_WR;TGL_WR;	
     }

     s = scanline;
     
     for(x=0;x<110;x+=10){
       *LCD = *s++; TGL_WR;TGL_WR;
       *LCD = *s++; TGL_WR;TGL_WR;       
       *LCD = *s++; TGL_WR;TGL_WR;
       *LCD = *s++; TGL_WR;TGL_WR;       
       *LCD = *s++; TGL_WR;TGL_WR;
       *LCD = *s++; TGL_WR;TGL_WR;       
       *LCD = *s++; TGL_WR;TGL_WR;
       *LCD = *s++; TGL_WR;TGL_WR;       
       *LCD = *s++; TGL_WR;TGL_WR;
       *LCD = *s++; TGL_WR;TGL_WR;       
     }
     
   }

   /* * /
   write_command(0x37); write_data(0);
   write_command(0x36); write_data(176);
   write_command(0x39); write_data(0);
   write_command(0x38); write_data(220);
   /* */
   
 }

2 Likes

I changed the mode13 demo to use mode2 instead, then I made the following changes and it worked (103 fps, without palette animation of course):

#define TGL_WR								 \
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

void Pokitto::lcdRefreshMode2(uint8_t * scrbuf, uint16_t* paletteptr) {
uint32_t x,y;
uint32_t scanline[2][88]; // read two nibbles = pixels at a time
uint8_t *d;

write_command(0x20);  // Horizontal DRAM Address
write_data(0);  // 0
write_command(0x21);  // Vertical DRAM Address
write_data(0);
write_command(0x22); // write data to DRAM
CLR_CS_SET_CD_RD_WR;
SET_MASK_P2;
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

for(x=0;x<110;x+=2)
  {
    d = scrbuf+(x>>1);// point to beginning of line in data

    /** find colours in one scanline **/
    uint8_t s=0;
    for(y=0;y<88;y++)
    {
    uint8_t t = *d >> 4; // higher nibble
    uint8_t t2 = *d & 0xF; // lower nibble
    /** higher nibble = left pixel in pixel pair **/
    scanline[0][s] = *LCD = paletteptr[t] << 3; TGL_WR;TGL_WR;
    scanline[1][s++] = paletteptr[t2] << 3;
    /** testing only **/
    //scanline[0][s] = 0xFFFF*(s&1);
    //scanline[1][s] = 0xFFFF*(!(s&1));
    //s++;
    /** until here **/
    d+=110/2; // jump to read byte directly below in screenbuffer
    }
    s=0;
    /** draw scanlines **/
    /** leftmost scanline twice**/

    for (s=0;s<88;) {
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[0][s++]);TGL_WR;TGL_WR;
    }

    for (s=0;s<88;) {
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
    }


    for (s=0;s<88;) {
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
        *LCD = (scanline[1][s++]);TGL_WR;TGL_WR;
    }
    
  }

 CLR_MASK_P2;
}

2 Likes

Cool! Have to test it with my demo :slight_smile:

Please do, Iā€™m not 100% sure I didnā€™t break something. :stuck_out_tongue:

Suggestion: how about, in the beginning of a function, first make an unrolled loop to store the paletteptr to uint32_t palette32bit[16] with shifted color values. That avoids ā€œ<< 3ā€-shifts inside the following loop.

Might not be much faster thought.

I tried something like this (same idea, but do the copy when the palette is loaded instead of every frame, and also got rid of the >>4 and &0xF):

...
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);
const uint32_t *paletteL = reinterpret_cast<uint32_t *>(0x20000000);
const uint32_t *paletteH = paletteL+256;

for(x=0;x<110;x+=2)
  {
    d = scrbuf+(x>>1);// point to beginning of line in data

    /** find colours in one scanline **/
    uint8_t s=0;
    for(y=0;y<88;y++)
    {
    scanline[0][s] = *LCD = paletteH[*d]; TGL_WR;TGL_WR;
    scanline[1][s++] = paletteL[*d];
    d += 110/2;
    }
...

together with this:

void Display::load565Palette(const uint16_t* p) {
  *reinterpret_cast<volatile uint32_t *>(0x40048080) |= 3 << 26;
  uint32_t *paletteL = reinterpret_cast<uint32_t *>(0x20000000);
  uint32_t *paletteH = paletteL+256;
  for( uint32_t i=0; i<256; ++i ){
    palette[i] = p[i];
    paletteL[i] = uint32_t(p[i&0xF]) << 3;
    paletteH[i] = uint32_t(p[(i>>4)&0xF]) << 3;
  }
  //    for (int i=0;i<PALSIZE;i++) palette[i] = p[i];
  paletteptr = palette;
}

Got a speed drop to 91 fps. :thinking: