Fastest way to directly draw to screen?

Hello my frens,

I am trying to write my own screenbuffer to the screen and have found that using directPixel/lcdPixel is too slow. Writing chunks of pixels with pumpDRM is faster but maybe there’s something yet superior? Ofc I’ll check out the display docs but would like to also ask the experts here.

Also one more thing: can I maybe possibly theoretically switch the display to 332 mode? Now it accepts 565 16bit values, but my screenbuffer is in 332 8bit format. It would be ideal if I could do that and didn’t have to use a conversion palette, would save both memory and CPU cycles.

Thank yooou :slight_smile:

display seem support 12b mode (444), check page 41 of https://cdn-shop.adafruit.com/datasheets/ST7735R_V0.2.pdf
but yet there is not any support for a direct palette mode :confused:

1 Like

Yeah, there’s also 1 bit per component (8 colors), but sadly 8bit or palette modes aren’t present it seems. Nevermind, it would also be useful if I could set a different resolution or at least a specific rectangle to write to, gonna check that.

EDIT:

@SkyBerron IIRC you’re making no-framebuffer programs, right? What functions are you using?

Hmmm, I am now thinking something. The reason I think my screen writes are slow is that my time count (based solely on counting frames as frames * timePerFrame) starts to lag behind real time after some while, but as I’m looking at Pokittolib, this catches my eye:

	if ((((nextFrameMillis - now)) > timePerFrame) && frameEndMicros) { //if time to render a new frame is reached and the frame end has ran once
		updateHook(true);
		nextFrameMillis = now + timePerFrame;

Shouldn’t the last line be nextFrameMillis += timePerFrame; instead of what it is? Let’s say timePerFrame == 100, now imagine nextFrameMillis - now is e.g. 110, the branch gets executed and the next frame will be 100 + 110 == 210, while it should be 200, it will be 10 ms late. The way it is now you start losing millisecond over time. Not sure I’m not missing something tho, can you look at this @jonne @pharap @fmanga @Hanski? Basically the next frame time should be assigned relative to the previous frame time, not to current time.

EDIT: Also maybe I’m tired but I don’t see how this works. Let’s say

timePerFrame == 10
now == 100

therefore nextFrameTime = 100 + 10 == 110

now in the next call:

now == e.g. 111, therefore the if branch should pass, next frame should be executed.

nextFrameMillis - now == 110 - 111 == -1

if (-1 > 10) won't get executed though, and neither will it get executed in the subsequent calls because the difference will only get smaller, can't be > 10.

As I see it the condition should read if (now >= nextFrameMillis).

1 Like

you may also base all stuff with their own timing without set any wait to attempt synchronize each frame, speed will be consistent anyway is your real framerate

a tiny class i use near everywhere to check how much my objects need to move :

#define u32 uint32_t
#define u16 uint16_t
#define u8 uint8_t

u32 tick; // current tick, update it at each frame

class Rate {
  public:
  u32 last;
  u32 acc;
  u32 ticks;
  u16 px;

  Rate( u16 px );
  Rate( void );
  void reset( void );
  void ticksByPx( u32 ticks );
  void pxBySecond( u16 px );
  u32 update(void);
};

Rate::Rate( u16 px ){
  pxBySecond( px );
  reset();
}

Rate::Rate( void ){
  reset();
}

void Rate::reset( void ){
  last = tick;
  acc = 0;
}

void Rate::ticksByPx( u32 ticks ){
  this->ticks = ticks * 1024;
  this->px = 1024000 / this->ticks;
}

void Rate::pxBySecond( u16 px ){
  this->ticks = 1024000 / px;
  this->px = px;
}

u32 Rate::update(void){
  if( tick == last ) return 0;

  acc += ( tick - last ) * 1024;
  last = tick;

  if( acc < ticks ) return 0;

  u32 px = 0;
  do {
    px++;
    acc -= ticks;
  }	while( acc >= ticks );

  return px;
}

// example, move a power bar to the real value

Rate powerBarSpeed( 60 );

  if( power != powerDraw ){
    u8 barUpdate = powerBarSpeed.update();
    int diff = powerDraw - power;
    if( diff < 0 ){ // raise drawed power bar
      if( -diff < barUpdate )
        powerDraw = power;
      else
        powerDraw += barUpdate;
    } else { // reduce drawed power bar
      if( diff < barUpdate )
        powerDraw = power;
      else
        powerDraw -= barUpdate;
    }
  } else powerBarSpeed.reset();

// example : map auto scrolling

void Map::follow(
	 int x, int y, // follow pos
	 u32 sx, u32 sy, // follow size
	 Rate * rate,
	 u32 maxUp, u32 maxDown, u32 maxLeft, u32 maxRight, // screen move box
	 u32 minUp, u32 minDown, u32 minLeft, u32 minRight // min screen box
){
	int scrollx = this->scrollx;
	int scrolly = this->scrolly;

	if( x < 0 ) x = 0;
	if( y < 0 ) y = 0;
	if( x > (int)sizeInPixelx ) x = sizeInPixelx;
	if( y > (int)sizeInPixely ) y = sizeInPixely;

	u32 screenX = (u32)x - ( out->x + scrollx );
	u32 screenY = (u32)y - ( out->y + scrolly );

	u32 screenEndX = screenX + (sx-1);
	u32 screenEndY = screenY + (sy-1);

	minRight = out->width - minRight;
	maxRight = out->width - maxRight;
	minDown = out->height - minDown;
	maxDown = out->height - maxDown;

	// scroll screen crop X
	if( screenX < minLeft ){
		scrollx -= minLeft - screenX;
	} else {
		if( screenEndX > minRight ){
			scrollx += screenEndX - minRight ;
		}
	}

	// scroll screen crop Y
	if( screenY < minUp ){
		scrolly -= minUp - screenY ;
	} else {
		if( screenEndY > minDown ){
			scrolly += screenEndY - minDown ;
		}
	}

	u32 px = rate->update();

	if( px ){
		if( screenX < maxLeft ){
			scrollx -= px ;
		} else {
			if( screenEndX > maxRight ){
				scrollx += px ;
			}
		}

		if( screenY < maxUp ){
			scrolly -= px ;
		} else {
			if( screenEndY > maxDown ){
				scrolly += px ;
			}
		}
	}

	// scroll crop
	if( scrollx < 0 ) scrollx = 0 ;
	if( scrolly < 0 ) scrolly = 0 ;

	setScroll( scrollx, scrolly );
}

// example for a jump (yet not really frame time independent) :

void lapinouApplyVelocity( void ){
  u32 r = lapinouVelocity.update();
  if( !r ) return;

  if( !lapinouWayY ){ // down
    while(r--){
      checkLeftRight();
      if( !canDown() ) return;
      lapinouPy += 1 ;
    };
    lapinouVelocity.pxBySecond( lapinouVelocity.px + 3 );
  } else { // up
    while(r--){
      checkLeftRight();
      lapinouPy -= 1 ;
    };

    checkLeftRight();

    u32 acc = 4;
    u32 newVelocity = lapinouVelocity.px > acc ? lapinouVelocity.px - acc : 0;

    if( newVelocity < 30 ){ // revert way
      lapinouWayY = 0; // down
      lapinouVelocity.pxBySecond( 40 );// lapinouInitialVelocity );
      return;
    }

    lapinouVelocity.pxBySecond( newVelocity );
  }
}

void lapinouStartJumping( u32 px ){
  lapinouWayY = 1;
  lapinouVelocity.pxBySecond( px );
  lapinouVelocity.reset();
}

I don’t see how lcd write speed is in any way related to the framerate limiter. Yes, there’s a bug in the code that makes it run at about half of the desired limit, but that’s just a matter of raising the limit.

Also, my game uses 565 directly, no palette lookup, and it really doesn’t make much of a speed difference. Store the palette in flash and it doesn’t make a ram difference either.

Indeed, last layer of SBDL (HAL) provides a one-row RGB565 linebuffer that has to be sent to display one row at a time. I have some code for that, depending on whether SBDL is running on SDL2 or directly on hardware. For ARM Cortex M0+ and ST7735R LCD controller, I resort to my own asm code. While working in my Direct Mode Tests in “Java”, @FManga provided efficient code using inline asm that you can still find in the related thread. Later, frustrated with low framerates, I found that FemtoIDE Java, in part due to its Garbage Collector, was a huge waste of time, and switched to C/C++ and MinLib. I optimized a bit @FManga 's code and this is what I got:

# LineBuffer.S by SkyBerron

// Licensed under the Apache License, Version 2.0 (the “License”);
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an “AS IS” BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

.code 16
.syntax unified

.equ LPC_GPIO_PORT_MPIN0, 0xA0002180
.equ LPC_GPIO_PORT_MPIN1, 0xA0002184
.equ LPC_GPIO_PORT_MPIN2, 0xA0002188

.equ LPC_GPIO_PORT_CLR0, 0xA0002280
.equ LPC_GPIO_PORT_CLR1, 0xA0002284
.equ LPC_GPIO_PORT_CLR2, 0xA0002288

.equ LPC_GPIO_PORT_SET0, 0xA0002200
.equ LPC_GPIO_PORT_SET1, 0xA0002204
.equ LPC_GPIO_PORT_SET2, 0xA0002208

.equ LPC_GPIO_PORT_NOT0, 0xA0002300
.equ LPC_GPIO_PORT_NOT1, 0xA0002304
.equ LPC_GPIO_PORT_NOT2, 0xA0002308

.equ LCD_CD_PORT, 0
.equ LCD_CD_PIN, 2
.equ LCD_WR_PORT, 1
.equ LCD_WR_PIN, 12
.equ LCD_RD_PORT, 1
.equ LCD_RD_PIN, 24
.equ LCD_RES_PORT, 1
.equ LCD_RES_PIN, 0

.equ LCD_CD_SET, LPC_GPIO_PORT_SET0
.equ LCD_CD_CLR, LPC_GPIO_PORT_CLR0
.equ LCD_WR_SET, LPC_GPIO_PORT_SET1
.equ LCD_WR_CLR, LPC_GPIO_PORT_CLR1
.equ LCD_RD_SET, LPC_GPIO_PORT_SET1
.equ LCD_RD_CLR, LPC_GPIO_PORT_CLR1
.equ LCD_RES_SET, LPC_GPIO_PORT_SET1
.equ LCD_RES_CLR, LPC_GPIO_PORT_CLR1
.equ LCD_MPIN, LPC_GPIO_PORT_MPIN2

.text
.align 2
.global sendLinebuffer
.type sendLinebuffer, %function

// extern “C” void sendLinebuffer( unsigned short *s, int n );
// r0 : unsigned short *s
// r1 : int n
// The first four arguments are passed in R0 to R3, the rest is passed on the stack.
// The return values is returned in R0.
// The registers other than R0 to R3 must be saved and restored.

sendLinebuffer:

push {r4,r5}

ldr r3, =LCD_MPIN
ldr r4, =LCD_WR_CLR
ldr r5, =(1<<LCD_WR_PIN)
ldrh r2, [r0]
loop:
adds r0, 2
lsls r2, 3
str r2, [r3] // write color
str r5, [r4]
ldrh r2, [r0]
subs r1, 1
str r5, [r3, (LCD_WR_SET-LCD_MPIN)] // (LCD_WR_SET-LCD_MPIN)] = 124
bne loop

pop {r4,r5}

bx lr

It’s simple, it’s fast. But I don’t know if it suits your needs being in asm and requiring one row of RGB565 16 bit values.

2 Likes

I thought it was so slow that the desired FPS couldn’t be reached. When I was writing by single pixels it probably was the case because it was running extremely slowly. With 176 * 220 writes any extra instruction and data on the bus per-pixel count.

Yep I’m storing it in flash. I need a palette because my code works with 332 colors and I can’t write these directly to the display. For that reason I also need an extra buffer of a significant size (e.g. 220 * 2 bytes) so that I can write to DRAM by chunks. That costs me RAM for the buffer and extra instructions for palette lookup, again when it’s per-pixel it’s not completely negligible. But yes, generally it works. I’m just trying to save resources for programs that will use my code.

Will there be an attempt at fixing it? For my code I need the framerate to be precise (as long as it can be achieved ofc). Also it would be good in general to have it behave correctly…

1 Like

IIRC @jpfli had a fix proposal for the fps limiter bug but I do not know what happened to it

1 Like

Thank you very much for the replies.

I’ve resorted to my own FPS management and it works very nicely (still the bug should be fixed as having people bypass library code so that they can correctly do what the library is supposed to be doing is not really good). I am updating the screen with pumpDRAMdata by lines and it’s indeed fast enough now.

2 Likes