Improving FPS


oops :slight_smile: Optimizer does something funny there.


With this I got 62-67 fps in my demo, so that is a step up :slight_smile:


But unfortunately I am seeing vertical “spikes” or “cracks” which I do not see in my old implementation (62 fps) of lcdRefreshMode2().

Try this binary to see what I mean. They are not visible all the time. They are best visible on the dark background.

hello.bin (37.5 KB)


Up from 61-63? :stuck_out_tongue:

Do those go away when you remove the FPS counter?

@jonne: Could you share your experiments with writing to the screen with DMA? Looks like Mode2/13 could benefit from it. I think we can double-buffer 32-bit scanlines and have the DMA copy them out while the CPU is decoding. Any documentation I come across doesn’t cover this specific case.


Cracks are there without the FPS counter too.


Do you see tearing with this?

#define TGL_WR(OP)							\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  OP;									\
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

#define TGL_WR								\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

void Pokitto::lcdRefreshMode2(uint8_t * scrbuf, uint16_t* paletteptr ) {
uint32_t x,y;
uint32_t scanline[110]; // read two nibbles = pixels at a time
uint8_t *d;

write_command(0x03); write_data(0x1038);
write_command(0x20);  // Horizontal DRAM Address
write_data(0);  // 0
write_command(0x21);  // Vertical DRAM Address
write_command(0x22); // write data to DRAM
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

d = scrbuf;// point to beginning of line in data

    /** find colours in one scanline **/
    uint8_t s=0;
      uint8_t t = *d++;
      uint32_t color;
      color = uint32_t(paletteptr[t>>4])<<3;
      color = uint32_t(paletteptr[t&0xF])<<3;
    for (s=0;s<110;) {
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
      *LCD = (scanline[s]);TGL_WR(s++);TGL_WR;



Works beautifully! No cracks, 72 fps (!).

I changed the FPS counter to work horizontally, so it skips now a bit more pixels, but I do not think it had a big impact. I can check it later.


How are you drawing your tiles? Are you using drawBitmap or your own routine?

I’m using the following, which isn’t perfect, but seems to be a little faster than drawBitmap, probably because its the same thing with some parts removed. However, it obviously isn’t as fast as what you’re doing.

void drawTile(int x, int y, const uint8_t *buf){

// if out of screen bounds, don't bother
if(x<-7 || y<-7)return;
if(x>109 || y>87)return;
int t=2; // skip width+height
int xs = x;
int ys = y;


// assuming 8*8 tile always
int off0 = xs+110*ys;
for(y=0; y<8; y++){
    int ytemp = ys+y;
    if(ytemp>=0 && ytemp<mygame.display.height){
        for(x=0; x<8; x++){
            int xtemp=xs+x;
            if(xtemp>=0 && xtemp<mygame.display.width){
                mygame.display.screenbuffer[off0] = buf[t];
            off0++; // next x screen pixel
    off0+=102; // next y screen pixel



I am going to put the code in public soon. I have an own drawing method.

Below are quick tips off the top of my head:

  • Try the latest lcdRefreshMode2() by @FManga. That only can make a huge difference!
  • Set the screen persistence to 1
  • Make sure you use the “-O3” option in the compiler (And only that. If there is a “-Os” later it will overwrite the former!)
  • Do not make a function call for each tile, or make sure the function is inlined


Let’s have some PRs!!!


Yes, indeed. @FManga, could you put a PR first and then I will update the FPS counter changes with my PR?


Can do, but I’ll get home really late today, so it might be a while till I get a chance to do so.
It would be quicker if you just pushed it all at once?


mode2 != mode 13 :frowning:


Sure, I can do it.


Cleaning of my code took more time than expected, so I did not have time to do a PR today.


GCC has an attribute that can force the compiler to use -O3 for a specific function even if the compiler is set to use other options.



Who remembers that gameboy emulator? Would these frame-rate advances help it run faster at all?


There’s a Gameboy emulator? o_O


Yes it showed some kind of super slowmo Super Mario Land: Thank you video #1 where we look at buttons and cool features of the LPC11U68

Nice tech demo tho


Interesting. I hadn’t seen that video yet. What happened to it?