Improving FPS

oops :slight_smile: Optimizer does something funny there.

With this I got 62-67 fps in my demo, so that is a step up :slight_smile:

But unfortunately I am seeing vertical “spikes” or “cracks” which I do not see in my old implementation (62 fps) of lcdRefreshMode2().

Try this binary to see what I mean. They are not visible all the time. They are best visible on the dark background.

hello.bin (37.5 KB)

Up from 61-63? :stuck_out_tongue:

Do those go away when you remove the FPS counter?

@jonne: Could you share your experiments with writing to the screen with DMA? Looks like Mode2/13 could benefit from it. I think we can double-buffer 32-bit scanlines and have the DMA copy them out while the CPU is decoding. Any documentation I come across doesn’t cover this specific case.

Cracks are there without the FPS counter too.

Do you see tearing with this?


#define TGL_WR(OP)							\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  OP;									\
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

#define TGL_WR								\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

void Pokitto::lcdRefreshMode2(uint8_t * scrbuf, uint16_t* paletteptr ) {
uint32_t x,y;
uint32_t scanline[110]; // read two nibbles = pixels at a time
uint8_t *d;

write_command(0x03); write_data(0x1038);
write_command(0x20);  // Horizontal DRAM Address
write_data(0);  // 0
write_command(0x21);  // Vertical DRAM Address
write_data(0);
write_command(0x22); // write data to DRAM
CLR_CS_SET_CD_RD_WR;
SET_MASK_P2;
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

d = scrbuf;// point to beginning of line in data
  for(y=0;y<88;y++)
  {

    /** find colours in one scanline **/
    uint8_t s=0;
    for(x=0;x<110;x+=2)
    {
      uint8_t t = *d++;
      uint32_t color;
      color = uint32_t(paletteptr[t>>4])<<3;
      scanline[s]=*LCD=color;TGL_WR(s++);TGL_WR;
      color = uint32_t(paletteptr[t&0xF])<<3;
      scanline[s]=*LCD=color;TGL_WR(s++);TGL_WR;
    }
    
    s=0;
    for (s=0;s<110;) {
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
    }
    
  }

 CLR_MASK_P2;
}

Works beautifully! No cracks, 72 fps (!).

I changed the FPS counter to work horizontally, so it skips now a bit more pixels, but I do not think it had a big impact. I can check it later.

1 Like

@Hanski
How are you drawing your tiles? Are you using drawBitmap or your own routine?

I’m using the following, which isn’t perfect, but seems to be a little faster than drawBitmap, probably because its the same thing with some parts removed. However, it obviously isn’t as fast as what you’re doing.

[code]
void drawTile(int x, int y, const uint8_t *buf){

// if out of screen bounds, don't bother
if(x<-7 || y<-7)return;
if(x>109 || y>87)return;
int t=2; // skip width+height
int xs = x;
int ys = y;

if(y<0){
    ys=0;
    t+=8*(-y);
}

// assuming 8*8 tile always
int off0 = xs+110*ys;
for(y=0; y<8; y++){
    int ytemp = ys+y;
    if(ytemp>=0 && ytemp<mygame.display.height){
        for(x=0; x<8; x++){
            int xtemp=xs+x;
            if(xtemp>=0 && xtemp<mygame.display.width){
                mygame.display.screenbuffer[off0] = buf[t];
            }
            t++;
            off0++; // next x screen pixel
        }
    }
    off0+=102; // next y screen pixel
}

}[/code]

I am going to put the code in public soon. I have an own drawing method.

Below are quick tips off the top of my head:

  • Try the latest lcdRefreshMode2() by @FManga. That only can make a huge difference!
  • Set the screen persistence to 1
  • Make sure you use the “-O3” option in the compiler (And only that. If there is a “-Os” later it will overwrite the former!)
  • Do not make a function call for each tile, or make sure the function is inlined
1 Like

Let’s have some PRs!!!

Yes, indeed. @FManga, could you put a PR first and then I will update the FPS counter changes with my PR?

1 Like

Can do, but I’ll get home really late today, so it might be a while till I get a chance to do so.
It would be quicker if you just pushed it all at once?

1 Like

mode2 != mode 13 :frowning:

1 Like

Sure, I can do it.

1 Like

Cleaning of my code took more time than expected, so I did not have time to do a PR today.

GCC has an attribute that can force the compiler to use -O3 for a specific function even if the compiler is set to use other options.

__attribute__((optimize("-O3")))

2 Likes

Who remembers that gameboy emulator? Would these frame-rate advances help it run faster at all?

3 Likes

There’s a Gameboy emulator? o_O

Yes it showed some kind of super slowmo Super Mario Land: Thank you video #1 where we look at buttons and cool features of the LPC11U68

Nice tech demo tho

Interesting. I hadn’t seen that video yet. What happened to it?