Good progress in optimisation, as the frame rate of my little demo is now about 44-45 fps
The most of the performance increase was gained when I realised that the function static inline void setup_data_16(uint16_t data)
called a lot inside the lcdRefreshMode2()
(and used in other modes too) was actually compiled as a normal function (!). I verified it by looking at the assembler listing. By copying the commands in setup_data_16()
directly to the lcdRefreshMode2()
I got over 10 fps increase in performance.
I could not yet figure out why the compiler decides to use a function call instead of in-lining the code (I have optimization enabled). I have to check if this is the case in other projects too than mine, but I suspect it is.