Improving FPS

Really? It looks OK here in Coffins.
How are you testing it?
It’s left-to-right, high-to-low, right?

Confirmed, it is your code. #define __ARM_CC_VERSION 1 to override the new asm version gives correct output in mode1

Odd… give me a sec so I can get that example to compile.

Let me push the whole thing first ?

Sure, go ahead.

Do you see the same problem with anything else that uses Mode 1? I tested with 3dblobs, Coffins and Mandelbrot.

Pushed. Check it out.

1 Like

Is build/tracker.bin supposed to be broken?

no it should not

tracker.bin (88.1 KB)

Can you send a bin with the asm? When I build it, it looks OK.
Edit: Still trying to reproduce the issue. I can’t get it to break. :confused:

tracker.bin (87.3 KB)

Edit: how about you send me your built version bin as well?

tracker-Os.bin (88.7 KB)
I suspect the problem is in the loading of the palette:

  volatile uint32_t palette[32];
  for( uint32_t i=0; i<16; ++i ){
    palette[(i<<1)+1] = static_cast<uint32_t>(paletteptr[i&3 ]) << 3;
    palette[(i<<1)  ] = static_cast<uint32_t>(paletteptr[i>>2]) << 3;
  }

Moving the +1 to the lower line makes my build look like yours. :thinking:

Went out to dinner with this bit of code nagging at the back of my head. Just got back and sent a PR to test.
It’s the first time I’ve ever had to use the “register” keyword in C++.

2 Likes

It’s working now, many thanks

1 Like

If things are getting to the point where register is making a difference then the compiler is probably the one at fault.

For the record register is a deprecated keyword, and as of C++17 the register keyword is unused an unreserved - it doesn’t/won’t even have ‘automatic storage duration’ anymore.

I’m wondering if maybe just switching to assembly for this part would be better since it would at least be a bit more predicable.

1 Like

Nah, it was my fault for writing assembly that relies on GCC’s register allocation order to work. The algorithm changed in some version of GCC so it worked for me, but probably not for anyone else.

The problem wasn’t in writing to the palette array as I suspected earlier, but reading from it. I use the LDM instruction to load two colors from the palette into registers at once:
ldm %[t], {%[t], %[x]}

This writes to t and x in ascending order of register number, so the variables had to be defined like this:

 register uint32_t t asm("r1");```
From what I understand, this can't be done in an asm constraint and register+asm is the way to do it. If GCC really deprecates `register` in this situation, they're going to have to either allow asm without it, or create more specific constraints.

Ok, that makes more sense.

That’s a really strange constraint.

r1 and r2 can’t be specified in the input operands list?

Most likely they’ll continue to allow it as a compiler extension or do some kind of handwaving to say it’s allowed.

That said, more specific constraints would be useful.
I think the constraints available vary by target platform though.

For example, AVR has specific constraints for ‘upper register’, ‘simple upper register’, ‘special upper register pair’, ’ temporary register’ and for selecting the upper or lower register of a register pair.

Though I must admit, expressing “this register must be a lower number than this other register” without specifying exact register numbers would be somewhat difficult.
It’s a very unusual constraint.


To clarify, as far as I’m aware, as of C++17 the word register being present doesn’t trigger an error, it just has no effect.
Prior to C++17 the effect is to give the variable ‘automatic storage duration’, which is redundant because variables have ‘automatic storage duration’ by default.
So essentially the change is just semantic - it changes a modifier with a redundant effect into a modifier that does nothing.
(I could be wrong, but I don’t particularly want to check the standard, it would take a few hours to load.)

The LDM instruction takes a register list, encoded as a bitfield. I find this pretty neat: you can use it to load the entire register set with a single op, if you want. It just can’t encode the order registers were added to the list, so it goes from r0 to r15.

As far as I know, no. You can specify that you want a low or a high register, but in this case they both need to be low.


Changing the subject a bit, I’m having a look at drawBitmapData.
Why does it use m_colordepth instead of POK_COLORDEPTH and ifdefs like the rest of the code? Does it make sense to call drawBitmapData with m_colordepth != POK_COLORDEPTH?

Off the top of my head: its there to force using 1-bit bitmaps in arduboy/gamebuino modes even though the display mode is 4 bits

Ah, that makes more sense. That’s a bit less unusual, I’m sure I’ve seen at least two ISAs that have something like that.