Excellent! Thanks. I am waiting.
A pull request done
No, thatās just mode2. I think @spinal is still experimenting with mode13?
I can send a PR for ModeGBC soon, if youād like.
Yes please if possible.
I am on summer holiday, as you may or may not have noticed from a bunch of new games ported. I intend to be flooding the place with new content. I have good time to look at the improvements you guys have come up with.
Soā¦ closeā¦
That is awesome!
When I did my version, I used Zboy as base, remapped the 64kB ROM address calls to static flash (instead of ram).
I had a feeling the real issue with the speed was some sort of a timing glitch and not lack of cpu
I saw you were using zboy, so I went with it too. Iām also using flash for ROM, but I think it may be possible to put it in RAM for mbc0 games (that way weād be able to load games from the SD). Iāve been testing with Tetris, Dr Mario and Kwirk. Theyāre still a bit slow, but playable. Iām only focusing on mbc0 games, for now.
I uploaded the code in its current/rough state here: https://github.com/felipemanga/PokittoZBoy
At first I also thought timing could be an issue, so I disabled the throttling and didnāt get any improvement. Since I couldnāt find a hardware debugger in Brazil, I modified ProjectABE for profiling. I can now make a āhotspot rankingā like this:
#6ca1 ___ 100%
#6865 ___ 99.48%
#5f8d ___ 99.4%
#b4d6 ___ 73.38%
#b4d8 ___ 73.38%
#b504 ___ 71.88%
#383c ___ 36.82%
#3844 ___ 36.82%
#3848 ___ 36.82%
#3854 ___ 36.82%
...
ModeGBC was the first bottleneck and replacing it with the version I posted previously helped a bit. Then I had to rewrite setPixel, getPixel, and most of DrawBackground. No point in drawing to framebuffer, copying that to PokittoLibās framebuffer, then copying that to the LCD.
I removed the huge switch from the CPU interpreter and used an array of function pointers, instead. To lower the per-op overhead, I update the CPU more often than the other systems (16:1 was the most I could get away with).
The MMU was next: I use SRAM1 as a āRAM paletteā, to index addresses into blocks of memory. Reading is now simple enough that the compiler inlines it.
The profiler also pointed out things like:
CurLY = VideoClkCounterVBlank / 456
This results in a call to udivsi3, so I replaced it with a fixed point multiply:
CurLY = VideoClkCounterVBlank * (0x1000000 / 456) >> 24
Now individual opcodes are the bottleneck (#6ca1 is the implementation of LD A,($FF00+n)
). This is a good thing, because that means the rest of the emulator is not in the way, and a bad thing because itās hard to make it any faster.
That still shows a little noise. Iām sure most people wont mind though.
Iām assuming the SD library is my main bottleneck in the following experimentā¦
mode13_stream.bin (63.3 KB)
movie.zip (2.5 MB)
Just unzip movie.dat to the SD card root and load up mode13_stream.bin and have your sock blown off by the incredible 8fps silent movie!
PFFS is much faster than SDFS. Which one you are using?
Whichever one is default in the current pokittolib?
Depends on the API you are using:
- fopen, fclose etc. are using SDFS.
- PokittoDisk.h (FileOpen(), FileClose()) is using PFFS.
Currently fopen etc.
I previously tested the speed of the both file systems by reading a 200 kb file in 1 kb blocks:
- PetitFatFS: 264 kb/s
- SDFileSystem: 80 kb/s
I am using Samsung 1GB MicroSD card in Pokitto. The speed might depend on the SD card also.
Iāve never (knowingly) used pffs before, have I converted the following correctly? it doesnāt seem to be workingā¦
[code]
int main(){
game.begin();
game.display.persistence=1;
game.setFrameRate(999);
int temp;
pokInitSD(); // Call init always.
while (game.isRunning()) {
// FILE *handle = fopen("/sd/movie.dat", ārbā);
//if (handle){
if (fileOpen("/sd/movie.dat",FILE_MODE_BINARY)) {
unsigned char col[3];
uint16_t tempPal[256];
for(temp=0; temp<256; temp++){
//fread(&col[0], sizeof(char), 3, handle);
fileReadBytes(&col[0], 3);
pal[temp] = (col[0]>>3) | ((col[1] >> 2) << 5) | ((col[2] >> 3) << 11);
}
game.display.load565Palette(&pal[0]); // load a palette the same way as any other palette in any other screen mode
bool stillGoing=1;
while(stillGoing==1){
if(game.update()){
//if(!fread(&game.display.screenbuffer[0], 1, 110*88, handle))stillGoing=0;
if(!fileReadBytes(&game.display.screenbuffer[0], 9680))stillGoing=0;
}
}
fileClose();
//} // if handle
} // file open
}
return 1;
}[/code]
ā¦ to make this a useful idea, I should probably think about reading less data. Sound might be an issue also.
Looks ok to me.
you have sound?
weird, Iām just getting a green screen at 5fpsā¦
soundā¦ Nope, wouldnāt have a clue where to start on the pokitto. My DS version of this used jpg for the images and raw wav for sound (interleaved with the images). It worked quite well.
[edit] got it working, now at 21fps! Time to think about sound I guess.
I tried two different cards, got ~19fps with your demo. Looks good enough for cutscenes in games!
To get rid of the noise completely, I think you need to replace the s
in TGL_WR(s)
with something that would cost a cycle. Iām not sure if an inline asm NOP is best. Maybe something like this?
*LCD = *s; TGL_WR(s+=2);TGL_WR(s--);
I havenāt tested it, you might have to pick between noise and speed.
I think Iād prefer an inline nop since the intent is clearer.
Trying to abuse arithmetic is just going to confuse people.
I would too, but it seems the inline asm confuses the compiler, which is worse. I havenāt actually checked the disassembly, but using a NOP gives an unexpected hit to the FPS.