Improving FPS

Excellent! Thanks. I am waiting.

1 Like

A pull request done

1 Like

Pull request merged.

Does this also include the Mode13 stuff? @spinal, @FManga ?

1 Like

No, thatā€™s just mode2. I think @spinal is still experimenting with mode13?
I can send a PR for ModeGBC soon, if youā€™d like.

Yes please if possible.

I am on summer holiday, as you may or may not have noticed from a bunch of new games ported. I intend to be flooding the place with new content. I have good time to look at the improvements you guys have come up with.

Soā€¦ closeā€¦

5 Likes

That is awesome!

When I did my version, I used Zboy as base, remapped the 64kB ROM address calls to static flash (instead of ram).

I had a feeling the real issue with the speed was some sort of a timing glitch and not lack of cpu

1 Like

I saw you were using zboy, so I went with it too. Iā€™m also using flash for ROM, but I think it may be possible to put it in RAM for mbc0 games (that way weā€™d be able to load games from the SD). Iā€™ve been testing with Tetris, Dr Mario and Kwirk. Theyā€™re still a bit slow, but playable. Iā€™m only focusing on mbc0 games, for now.

I uploaded the code in its current/rough state here: https://github.com/felipemanga/PokittoZBoy

At first I also thought timing could be an issue, so I disabled the throttling and didnā€™t get any improvement. Since I couldnā€™t find a hardware debugger in Brazil, I modified ProjectABE for profiling. I can now make a ā€œhotspot rankingā€ like this:

#6ca1 ___ 100%
#6865 ___ 99.48%
#5f8d ___ 99.4%
#b4d6 ___ 73.38%
#b4d8 ___ 73.38%
#b504 ___ 71.88%
#383c ___ 36.82%
#3844 ___ 36.82%
#3848 ___ 36.82%
#3854 ___ 36.82%
...

ModeGBC was the first bottleneck and replacing it with the version I posted previously helped a bit. Then I had to rewrite setPixel, getPixel, and most of DrawBackground. No point in drawing to framebuffer, copying that to PokittoLibā€™s framebuffer, then copying that to the LCD.

I removed the huge switch from the CPU interpreter and used an array of function pointers, instead. To lower the per-op overhead, I update the CPU more often than the other systems (16:1 was the most I could get away with).

The MMU was next: I use SRAM1 as a ā€œRAM paletteā€, to index addresses into blocks of memory. Reading is now simple enough that the compiler inlines it.

The profiler also pointed out things like:
CurLY = VideoClkCounterVBlank / 456
This results in a call to udivsi3, so I replaced it with a fixed point multiply:
CurLY = VideoClkCounterVBlank * (0x1000000 / 456) >> 24

Now individual opcodes are the bottleneck (#6ca1 is the implementation of LD A,($FF00+n)). This is a good thing, because that means the rest of the emulator is not in the way, and a bad thing because itā€™s hard to make it any faster.

5 Likes

That still shows a little noise. Iā€™m sure most people wont mind though.

Iā€™m assuming the SD library is my main bottleneck in the following experimentā€¦
mode13_stream.bin (63.3 KB)

movie.zip (2.5 MB)

Just unzip movie.dat to the SD card root and load up mode13_stream.bin and have your sock blown off by the incredible 8fps silent movie!

2 Likes

PFFS is much faster than SDFS. Which one you are using?

Whichever one is default in the current pokittolib?

Depends on the API you are using:

  • fopen, fclose etc. are using SDFS.
  • PokittoDisk.h (FileOpen(), FileClose()) is using PFFS.

Currently fopen etc.

I previously tested the speed of the both file systems by reading a 200 kb file in 1 kb blocks:

  • PetitFatFS: 264 kb/s
  • SDFileSystem: 80 kb/s

I am using Samsung 1GB MicroSD card in Pokitto. The speed might depend on the SD card also.

Iā€™ve never (knowingly) used pffs before, have I converted the following correctly? it doesnā€™t seem to be workingā€¦

[code]
int main(){
game.begin();
game.display.persistence=1;
game.setFrameRate(999);

int temp;
pokInitSD(); // Call init always.

while (game.isRunning()) {

// FILE *handle = fopen("/sd/movie.dat", ā€œrbā€);
//if (handle){
if (fileOpen("/sd/movie.dat",FILE_MODE_BINARY)) {

    unsigned char col[3];
    uint16_t tempPal[256];
    for(temp=0; temp<256; temp++){
        //fread(&col[0], sizeof(char), 3, handle);
        fileReadBytes(&col[0], 3);
        pal[temp] = (col[0]>>3) | ((col[1] >> 2) << 5) | ((col[2] >> 3) << 11);
    }
    game.display.load565Palette(&pal[0]); // load a palette the same way as any other palette in any other screen mode
    bool stillGoing=1;
    while(stillGoing==1){
        if(game.update()){
            //if(!fread(&game.display.screenbuffer[0], 1, 110*88, handle))stillGoing=0;
            if(!fileReadBytes(&game.display.screenbuffer[0], 9680))stillGoing=0;
        }
    }
    fileClose();
//} // if handle

} // file open

}

return 1;
}[/code]

ā€¦ to make this a useful idea, I should probably think about reading less data. Sound might be an issue also.

Looks ok to me.

you have sound?

weird, Iā€™m just getting a green screen at 5fpsā€¦

soundā€¦ Nope, wouldnā€™t have a clue where to start on the pokitto. My DS version of this used jpg for the images and raw wav for sound (interleaved with the images). It worked quite well.

[edit] got it working, now at 21fps! Time to think about sound I guess.

https://t.co/NDrlTQbO0s

I tried two different cards, got ~19fps with your demo. Looks good enough for cutscenes in games!
To get rid of the noise completely, I think you need to replace the s in TGL_WR(s) with something that would cost a cycle. Iā€™m not sure if an inline asm NOP is best. Maybe something like this?
*LCD = *s; TGL_WR(s+=2);TGL_WR(s--);
I havenā€™t tested it, you might have to pick between noise and speed.

1 Like

I think Iā€™d prefer an inline nop since the intent is clearer.
Trying to abuse arithmetic is just going to confuse people.

I would too, but it seems the inline asm confuses the compiler, which is worse. I havenā€™t actually checked the disassembly, but using a NOP gives an unexpected hit to the FPS.