Improving FPS

Nothing happened. I just never updated it and it fell behind the library updates. I have the source ofcourse

#define TGL_WR								\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

void Pokitto::lcdRefreshModeGBC(uint8_t * scrbuf, uint16_t* paletteptr) {
   volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);
  uint32_t x,y;
  uint8_t *d;

  setWindow( 16, 30, 144+16, 159+30 );
  write_command_16(0x03); write_data_16(0x1038);
  write_command(0x22);
  CLR_CS_SET_CD_RD_WR;
  SET_MASK_P2;

  d = scrbuf;
  for(y=0;y<144;++y){

    for(x=0;x<160;x+=4){
      
      uint8_t tdata = *d++;
      uint8_t t4 = tdata & 0x03; tdata >>= 2;// lowest half-nibble
      uint8_t t3 = tdata & 0x03; tdata >>= 2;// second lowest half-nibble
      uint8_t t2 = tdata & 0x03; tdata >>= 2;// second highest half-nibble
      uint8_t t = tdata;// highest half-nibble

    /** put nibble values in the scanlines **/

      *LCD = uint32_t(paletteptr[t])<<3; TGL_WR;
      *LCD = uint32_t(paletteptr[t2])<<3; TGL_WR;
      *LCD = uint32_t(paletteptr[t3])<<3; TGL_WR;
      *LCD = uint32_t(paletteptr[t4])<<3; TGL_WR;
      
    }
    
  }
  
}

4 Likes

Milk and cookie kept you awake, huh? Let’s discuss this. You better come up, Sebastian.

Blade Runner? I haven’t watched it yet. :hushed:

Share the GB Emulator Code, the community will work on it :wink:

It could be fun and a great buzz to have an Nano - GB in ur hand (thanks to pokitto emulation)

Instead of writing *reinterpret_cast< volatile uint32_t *>(0xA0002284) all the time,
why not set a named variable as a volatile reference to whatever 0xA0002284 is the address of?

volatile uint32_t & someMemoryMappedIOThing = *reinterpret_cast< volatile uint32_t *>(0xA0002284);

It would make the code a bit more self-documenting.

1 Like

Watch the directors cut of the original.

2 Likes

Yeah, that would be best and I already do it in some places (LCD). It’s like this only because I’m more familiar with the processor docs than mbed’s and I’m still poking around trying to find out what’s best. Cleaning up later is simple enough (rename LCD, move it and TGL_WR to HWLCD.h?). @Hanski: Does the cleanup you were doing include this?

Edit: Anybody know if one of these can be used for code profiling?
https://produto.mercadolivre.com.br/MLB-704520340-j-link-jlink-j-tag-segger-arm7-9-11-clone-_JM

The “china J-Links” come in 2 different generations. This ad does not say if you can upgrade the firmware to work with Keil MDK 5. That is the version you need and it is more rare than the one that will work only up till MDK 4

1 Like

Not 100% sure but I think you need something that mentions the v9 firmware

https://probots.co.in/index.php?main_page=product_info&products_id=789

I can also highly recommend the original J-Link EDU version, which I had for a long time. Zero problems, works really well

Ok, just write your suggestion here and will put it in PR.

I don’t know what I did, but I had the plasma demo running at 125fps at one point. lost it though, I was compiling the wrong target, randomly messing about with the code, confused about why nothing was changing. When I came back to it today, its 79fps.

Also I wouldn’t mind having the gb emulator, for novelty value mostly :sunglasses:

[edit] it was @FManga’s code further up. 125fps on the plasma demo. Unrolled loops gets 140fps.
How do I get rid of the line of noise along the middle of the screen?

Maybe an addition to the next PR so we can have a define in My_settings.h?

@spinal:
I’m not sure if the flag would be useful for others, simply disabling the code is simple enough.
Did you manage to get rid of the noise?
If not, try this:

#define TGL_WR(OP)							\
  *reinterpret_cast< volatile uint32_t *>(0xA0002284) = 1 << LCD_WR_PIN; \
  OP;									\
  *reinterpret_cast< volatile uint32_t *>(0xA0002204) = 1 << LCD_WR_PIN;

 void Pokitto::lcdRefreshMode13(uint8_t * scrbuf, uint16_t* paletteptr, uint8_t offset){
   uint16_t x,y;
   uint32_t scanline[110]; // read two nibbles = pixels at a time
   uint8_t *d;
   uint32_t *s;

   write_command_16(0x03); write_data_16(0x1038);
   write_command(0x20); write_data(0);
   write_command(0x21); write_data(0);
   write_command(0x22);
   CLR_CS_SET_CD_RD_WR;
   SET_MASK_P2;

   volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

   d = scrbuf;// point to beginning of line in data
   for(y=0;y<88;y++){

     s = scanline;

     for(x=0;x<110;x+=10){
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
       *LCD = *s = paletteptr[(*d + offset)&255]<<3; TGL_WR(s++);TGL_WR(d++);	
     }

     s = scanline;
     
     for(x=0;x<110;x+=10){
       *LCD = *s; TGL_WR(s++);TGL_WR(s);
       *LCD = *s; TGL_WR(s++);TGL_WR(s);       
       *LCD = *s; TGL_WR(s++);TGL_WR(s);
       *LCD = *s; TGL_WR(s++);TGL_WR(s);       
       *LCD = *s; TGL_WR(s++);TGL_WR(s);
       *LCD = *s; TGL_WR(s++);TGL_WR(s);       
       *LCD = *s; TGL_WR(s++);TGL_WR(s);
       *LCD = *s; TGL_WR(s++);TGL_WR(s);       
       *LCD = *s; TGL_WR(s++);TGL_WR(s);
       *LCD = *s; TGL_WR(s++);TGL_WR(s);       
     }
     
   }
   
 }

I was thinking of something like this (though LCD could probably use a better name. Any suggestions?):

// HWLCD.cpp
volatile uint32_t *LCD = reinterpret_cast< volatile uint32_t * >(0xA0002188);

// HWLCD.h

#define TGL_WR(OP)							\
  LPC_GPIO_PORT->SET[LCD_WR_PORT] = 1 << LCD_WR_PIN;			\
  OP;									\
  LPC_GPIO_PORT->CLR[LCD_WR_PORT] = 1 << LCD_WR_PIN;			

#define TGL_WR								\
  LPC_GPIO_PORT->SET[LCD_WR_PORT] = 1 << LCD_WR_PIN;			\
  __asm("nop"); \
  LPC_GPIO_PORT->CLR[LCD_WR_PORT] = 1 << LCD_WR_PIN;			

extern volatile uint32_t *LCD;

For some reason, using the LCD pointer is about 7 fps faster than using mbed’s MPIN[2]. Even stranger, this does not seem to affect SET/CLR used in TGL_WR.

Looks fine.

What’s MPIN defined as?

It’s possible that the indexing operation isn’t getting optimised away if it’s volatile,
because doing so might violate some of the guarantees of volatile.

The same way CLR and SET:

#ifdef __cplusplus
  #define   __I     volatile             /*!< Defines 'read only' permissions                 */
#else
  #define   __I     volatile const       /*!< Defines 'read only' permissions                 */
#endif
#define     __O     volatile             /*!< Defines 'write only' permissions                */
#define     __IO    volatile             /*!< Defines 'read / write' permissions              */


typedef struct {                                    /*!< GPIO_PORT Structure                                                   */
  __IO uint8_t   B[88];                             /*!< Byte pin registers                                                    */
  __I  uint32_t  RESERVED0[42];
  __IO uint32_t  W[88];                             /*!< Word pin registers                                                    */
  __I  uint32_t  RESERVED1[1896];
  __IO uint32_t  DIR[3];                            /*!< Port Direction registers                                              */
  __I  uint32_t  RESERVED2[29];
  __IO uint32_t  MASK[3];                           /*!< Port Mask register                                                    */
  __I  uint32_t  RESERVED3[29];
  __IO uint32_t  PIN[3];                            /*!< Port pin register                                                     */
  __I  uint32_t  RESERVED4[29];
  __IO uint32_t  MPIN[3];                           /*!< Masked port register                                                  */
  __I  uint32_t  RESERVED5[29];
  __IO uint32_t  SET[3];                            /*!< Write: Set port register Read: port output bits                       */
  __I  uint32_t  RESERVED6[29];
  __O  uint32_t  CLR[3];                            /*!< Clear port                                                            */
  __I  uint32_t  RESERVED7[29];
  __O  uint32_t  NOT[3];                            /*!< Toggle port                                                           */
} LPC_GPIO_PORT_Type;

#define LPC_GPIO_PORT                   ((LPC_GPIO_PORT_Type      *) LPC_GPIO_PORT_BASE)

@Hanski @FManga how about some PRs so I get to test? I don’t have enough time to start parsing from your tennis match where the ball is at the moment.

2 Likes

I am just about to make a PR…