Improving FPS


#82

Excellent! Thanks. I am waiting.


#83

A pull request done


#84

Pull request merged.

Does this also include the Mode13 stuff? @spinal, @FManga ?


#85

No, that’s just mode2. I think @spinal is still experimenting with mode13?
I can send a PR for ModeGBC soon, if you’d like.


#86

Yes please if possible.

I am on summer holiday, as you may or may not have noticed from a bunch of new games ported. I intend to be flooding the place with new content. I have good time to look at the improvements you guys have come up with.


#87

So… close…


#88

That is awesome!

When I did my version, I used Zboy as base, remapped the 64kB ROM address calls to static flash (instead of ram).

I had a feeling the real issue with the speed was some sort of a timing glitch and not lack of cpu


#89

I saw you were using zboy, so I went with it too. I’m also using flash for ROM, but I think it may be possible to put it in RAM for mbc0 games (that way we’d be able to load games from the SD). I’ve been testing with Tetris, Dr Mario and Kwirk. They’re still a bit slow, but playable. I’m only focusing on mbc0 games, for now.

I uploaded the code in its current/rough state here: https://github.com/felipemanga/PokittoZBoy

At first I also thought timing could be an issue, so I disabled the throttling and didn’t get any improvement. Since I couldn’t find a hardware debugger in Brazil, I modified ProjectABE for profiling. I can now make a “hotspot ranking” like this:

#6ca1 ___ 100%
#6865 ___ 99.48%
#5f8d ___ 99.4%
#b4d6 ___ 73.38%
#b4d8 ___ 73.38%
#b504 ___ 71.88%
#383c ___ 36.82%
#3844 ___ 36.82%
#3848 ___ 36.82%
#3854 ___ 36.82%
...

ModeGBC was the first bottleneck and replacing it with the version I posted previously helped a bit. Then I had to rewrite setPixel, getPixel, and most of DrawBackground. No point in drawing to framebuffer, copying that to PokittoLib’s framebuffer, then copying that to the LCD.

I removed the huge switch from the CPU interpreter and used an array of function pointers, instead. To lower the per-op overhead, I update the CPU more often than the other systems (16:1 was the most I could get away with).

The MMU was next: I use SRAM1 as a “RAM palette”, to index addresses into blocks of memory. Reading is now simple enough that the compiler inlines it.

The profiler also pointed out things like:
CurLY = VideoClkCounterVBlank / 456
This results in a call to udivsi3, so I replaced it with a fixed point multiply:
CurLY = VideoClkCounterVBlank * (0x1000000 / 456) >> 24

Now individual opcodes are the bottleneck (#6ca1 is the implementation of LD A,($FF00+n)). This is a good thing, because that means the rest of the emulator is not in the way, and a bad thing because it’s hard to make it any faster.


#90

That still shows a little noise. I’m sure most people wont mind though.

I’m assuming the SD library is my main bottleneck in the following experiment…
mode13_stream.bin (63.3 KB)

movie.zip (2.5 MB)

Just unzip movie.dat to the SD card root and load up mode13_stream.bin and have your sock blown off by the incredible 8fps silent movie!


#91

PFFS is much faster than SDFS. Which one you are using?


#92

Whichever one is default in the current pokittolib?


#93

Depends on the API you are using:

  • fopen, fclose etc. are using SDFS.
  • PokittoDisk.h (FileOpen(), FileClose()) is using PFFS.

#94

Currently fopen etc.


#95

I previously tested the speed of the both file systems by reading a 200 kb file in 1 kb blocks:

  • PetitFatFS: 264 kb/s
  • SDFileSystem: 80 kb/s

I am using Samsung 1GB MicroSD card in Pokitto. The speed might depend on the SD card also.


#96

I’ve never (knowingly) used pffs before, have I converted the following correctly? it doesn’t seem to be working…

[code]
int main(){
game.begin();
game.display.persistence=1;
game.setFrameRate(999);

int temp;
pokInitSD(); // Call init always.

while (game.isRunning()) {

// FILE *handle = fopen("/sd/movie.dat", “rb”);
//if (handle){
if (fileOpen("/sd/movie.dat",FILE_MODE_BINARY)) {

    unsigned char col[3];
    uint16_t tempPal[256];
    for(temp=0; temp<256; temp++){
        //fread(&col[0], sizeof(char), 3, handle);
        fileReadBytes(&col[0], 3);
        pal[temp] = (col[0]>>3) | ((col[1] >> 2) << 5) | ((col[2] >> 3) << 11);
    }
    game.display.load565Palette(&pal[0]); // load a palette the same way as any other palette in any other screen mode
    bool stillGoing=1;
    while(stillGoing==1){
        if(game.update()){
            //if(!fread(&game.display.screenbuffer[0], 1, 110*88, handle))stillGoing=0;
            if(!fileReadBytes(&game.display.screenbuffer[0], 9680))stillGoing=0;
        }
    }
    fileClose();
//} // if handle

} // file open

}

return 1;
}[/code]

… to make this a useful idea, I should probably think about reading less data. Sound might be an issue also.


#97

Looks ok to me.

you have sound?


#98

weird, I’m just getting a green screen at 5fps…

sound… Nope, wouldn’t have a clue where to start on the pokitto. My DS version of this used jpg for the images and raw wav for sound (interleaved with the images). It worked quite well.

[edit] got it working, now at 21fps! Time to think about sound I guess.


#99

I tried two different cards, got ~19fps with your demo. Looks good enough for cutscenes in games!
To get rid of the noise completely, I think you need to replace the s in TGL_WR(s) with something that would cost a cycle. I’m not sure if an inline asm NOP is best. Maybe something like this?
*LCD = *s; TGL_WR(s+=2);TGL_WR(s--);
I haven’t tested it, you might have to pick between noise and speed.


#100

I think I’d prefer an inline nop since the intent is clearer.
Trying to abuse arithmetic is just going to confuse people.


#101

I would too, but it seems the inline asm confuses the compiler, which is worse. I haven’t actually checked the disassembly, but using a NOP gives an unexpected hit to the FPS.