sound… Nope, wouldn’t have a clue where to start on the pokitto. My DS version of this used jpg for the images and raw wav for sound (interleaved with the images). It worked quite well.
[edit] got it working, now at 21fps! Time to think about sound I guess.
I tried two different cards, got ~19fps with your demo. Looks good enough for cutscenes in games!
To get rid of the noise completely, I think you need to replace the s in TGL_WR(s) with something that would cost a cycle. I’m not sure if an inline asm NOP is best. Maybe something like this? *LCD = *s; TGL_WR(s+=2);TGL_WR(s--);
I haven’t tested it, you might have to pick between noise and speed.
I would too, but it seems the inline asm confuses the compiler, which is worse. I haven’t actually checked the disassembly, but using a NOP gives an unexpected hit to the FPS.
Hrm, possibly because it acts as a sort of memory barrier preventing the rearranging of instructions?
Does the pause have to be just one cycle?
I’m wondering if calling an empty function marked with __attribute((noinline)) would work or if a function call would be too many cycles.
Or perhaps doing:
volatile int nop __attribute__((unused)) = 0;
++nop; // Optional
Would force GCC to create and use the variable?
It might cause similar issues though if GCC ends up having to dump a register it’s using.
Or perhaps:
// global
static int nopVar = 0;
void __attribute__((always_inline)) nop(void)
{
++nopVar;
}
I think the fact it’s a global variable should mean that the compiler can’t optimise the operation away,
but it could probably still reorder it.
I haven’t looked into the rules regarding GCC and inline asm on ARM yet, so I have no idea. Maybe it considers a set of registers to get clobbered and has to reload them?
It doesn’t have to be one cycle, but it adds up.
We’re getting to a point where the high-level language is getting in the way and writing these routines in assembly would be best, both in terms of performance and legibility.
The fundamental rules of inline asm on GCC are the same for all chips, what changes is the registers, the available instructions and the register aliases, e.g. on AVR r means any register and d means an upper register.
It shouldn’t clobber a register unless you explicitly state it to be an output operand or to be clobbered.
Considering this is rendering code, I’m inclined to agree.
As long as the external interface is C++ friendly,
the implementation can be pure assembly for speed sake,
providing it’s well-commented.
Functions do have a memory alignment, but if it pads I would assume it’s just padding with more nops,
which shouldn’t be a major issue.
ARM instructions are 32 bits wide, but I can’t find any information about function alignment.
I found a thing discussing the alignment of structs which said that the largest field being int causes 4-byte alignment and the largest field being long long causes 8-byte alignment, but nothing beyond that.
I’m erring on the side of memory barriers/formal no-tampering rules though, see this SO question that demonstrates how something as simple as asm("# im in ur loop"); can have an impact because of the side effects of the presense of an asm block (note that the asm block here is implicitly asm volatile).
The Pokitto uses Thumb, not the full 32-bit ARM IS, so it’s just 16 bits wide. I vaguely remember the TRM mentioning 4-byte instruction alignment for certain ops.
What do you mean? You’re using inline asm in the PokittoLib: