oops Optimizer does something funny there.
With this I got 62-67 fps in my demo, so that is a step up
But unfortunately I am seeing vertical “spikes” or “cracks” which I do not see in my old implementation (62 fps) of lcdRefreshMode2().
Try this binary to see what I mean. They are not visible all the time. They are best visible on the dark background.
hello.bin (37.5 KB)
Up from 61-63?
Do those go away when you remove the FPS counter?
@jonne: Could you share your experiments with writing to the screen with DMA? Looks like Mode2/13 could benefit from it. I think we can double-buffer 32-bit scanlines and have the DMA copy them out while the CPU is decoding. Any documentation I come across doesn’t cover this specific case.
Cracks are there without the FPS counter too.
Do you see tearing with this?
#define TGL_WR(OP) \
*reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
OP; \
*reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;
#define TGL_WR \
*reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
*reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;
void Pokitto::lcdRefreshMode2(uint8_t * scrbuf, uint16_t* paletteptr ) {
uint32_t x,y;
uint32_t scanline[110]; // read two nibbles = pixels at a time
uint8_t *d;
write_command(0x03); write_data(0x1038);
write_command(0x20); // Horizontal DRAM Address
write_data(0); // 0
write_command(0x21); // Vertical DRAM Address
write_data(0);
write_command(0x22); // write data to DRAM
CLR_CS_SET_CD_RD_WR;
SET_MASK_P2;
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);
d = scrbuf;// point to beginning of line in data
for(y=0;y<88;y++)
{
/** find colours in one scanline **/
uint8_t s=0;
for(x=0;x<110;x+=2)
{
uint8_t t = *d++;
uint32_t color;
color = uint32_t(paletteptr[t>>4])<<3;
scanline[s]=*LCD=color;TGL_WR(s++);TGL_WR;
color = uint32_t(paletteptr[t&0xF])<<3;
scanline[s]=*LCD=color;TGL_WR(s++);TGL_WR;
}
s=0;
for (s=0;s<110;) {
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
*LCD = (scanline[s]);TGL_WR(s++);TGL_WR;
}
}
CLR_MASK_P2;
}
Works beautifully! No cracks, 72 fps (!).
I changed the FPS counter to work horizontally, so it skips now a bit more pixels, but I do not think it had a big impact. I can check it later.
@Hanski
How are you drawing your tiles? Are you using drawBitmap or your own routine?
I’m using the following, which isn’t perfect, but seems to be a little faster than drawBitmap, probably because its the same thing with some parts removed. However, it obviously isn’t as fast as what you’re doing.
[code]
void drawTile(int x, int y, const uint8_t *buf){
// if out of screen bounds, don't bother
if(x<-7 || y<-7)return;
if(x>109 || y>87)return;
int t=2; // skip width+height
int xs = x;
int ys = y;
if(y<0){
ys=0;
t+=8*(-y);
}
// assuming 8*8 tile always
int off0 = xs+110*ys;
for(y=0; y<8; y++){
int ytemp = ys+y;
if(ytemp>=0 && ytemp<mygame.display.height){
for(x=0; x<8; x++){
int xtemp=xs+x;
if(xtemp>=0 && xtemp<mygame.display.width){
mygame.display.screenbuffer[off0] = buf[t];
}
t++;
off0++; // next x screen pixel
}
}
off0+=102; // next y screen pixel
}
}[/code]
I am going to put the code in public soon. I have an own drawing method.
Below are quick tips off the top of my head:
- Try the latest lcdRefreshMode2() by @FManga. That only can make a huge difference!
- Set the screen persistence to 1
- Make sure you use the “-O3” option in the compiler (And only that. If there is a “-Os” later it will overwrite the former!)
- Do not make a function call for each tile, or make sure the function is inlined
Let’s have some PRs!!!
Yes, indeed. @FManga, could you put a PR first and then I will update the FPS counter changes with my PR?
Can do, but I’ll get home really late today, so it might be a while till I get a chance to do so.
It would be quicker if you just pushed it all at once?
mode2 != mode 13
Sure, I can do it.
Cleaning of my code took more time than expected, so I did not have time to do a PR today.
GCC has an attribute that can force the compiler to use -O3
for a specific function even if the compiler is set to use other options.
__attribute__((optimize("-O3")))
Who remembers that gameboy emulator? Would these frame-rate advances help it run faster at all?
There’s a Gameboy emulator? o_O
Yes it showed some kind of super slowmo Super Mario Land: Thank you video #1 where we look at buttons and cool features of the LPC11U68
Nice tech demo tho
Interesting. I hadn’t seen that video yet. What happened to it?