TAS mode for Javitto (development thread)

I hope everyone has enough aspirin for all the silly questions I keep posting haha. But the experts keep pulling me out of the mud and it is awesome seeing things happen!

1 Like

Its always nice to learn from experts!

1 Like

Looks like switching maps already does just work! :partying_face: So that’s cool.

So the ongoing thing left will be optimizations, which I may need a lot of help with :sweat: there are lots of areas for improvement, but now that things are starting to work mostly, I will be in a good spot to start moving things around and making it gooder :smiley:

3 Likes

Well done so far :slight_smile:

1 Like

Maybe we could organize a mini-jam when this is finished? :slight_smile:

2 Likes

I like that idea :D!

1 Like

Trying to optimize a bit to make it more “playable” quality. FManga has as always been incredibly valuable with answering my silly questions :smiley:

I’ve come to a trouble with the TileFiller though. I’m not entirely sure how to get a single tile from the tileset, given the way it is connected up in Javitto.

My tileset is defined static const uint8_t tiles[][256] in C++, but I don’t know how to get the “array” (color ID’s in an array as a tile) given the tile index from the map.

I figured it would be something nasty like this uint8_t tile = ((uint8_t*)tileSet)[tileIdx]; but it doesn’t seem to want to cooperate heh…

here is where the loop begins now on the x axis.

Here is the TileMap definitions:

Basically, in really bad pseudo code and given what FManga is proposing I do:

for(x in width) {
    tileIndex = tileMap[x y w]
    tile = tileSet[tileIndex]
    for(x2 in tileWidth) {
        line[x+x2] = palette[tile[x y w]]
    }
}

So I want, for each horizontal line, to get the tile from the map and tileset, then loop that tile to get the colors from just that tile, then move to the next tile.

3 Likes

Getting somewhat closer! Thanks to the feedback from @carbonacat I am trudging closer and closer to “usable” (almost reaching 30fps! Up in the mid 20s now depending on the map)

Here is my Tile Line Filler now:

void fillLine(ushort[] line, int y) {
        if (offsetY + y < 0 || offsetY + y >= 176) return;
        
        int tileIdx;
        var calcX = 0;
        var tileIndexY = ((y+offsetY) / 16) * width;
        var modY = ((y+offsetY) % 16) * 16;
        
        int startX = offsetX;
        int endX = 220;
        
        if(offsetX < 0){
            startX = 0;
            endX += offsetX;
        }
        if(startX + endX > 220) {
            endX = 220;
        }
        
        for (int x = startX; x < endX; x+=16) {
            calcX = x-offsetX;
            
            __inline_cpp__("
                // Get tile index
                tileIdx = ((uint8_t*)tileMap)[(calcX / 16) + tileIndexY];
            ");
            
            for(int t = 0; t < 16; t++){
                __inline_cpp__("
                color = ((uint8_t*) tileSet)
                    [
                        tileIdx // map tile index
                        * 256 + // size of the tile (16x16)
                        (((t+calcX) % 16) + modY) // tile color index
                    ]; 
                ");
                line[x+t] = palette[color]; 
            }
            
        }
    }

It clearly needs more love heh… Right now I am trying to figure out how to accurately calculate the x axis, in particular with the inner loop over the tile itself.

Pretty colors!

4 Likes

Have you tried:

  • Implementing the whole function with inline C++
  • Replacing the / 16 with >> 4, the * 16 with << 4, the % 16 with & 0xF and the * 256 with << 8
    • The compiler should be doing these already, but just in case it’s not for some reason
  • Moving the tile index inside the for loop that needs it
    • Again, the compiler ought to be aware of this, but just in case
  • Unrolling the t loop

If I had a better idea of what some of the values were (e.g. calcX and modY) I might be able to think of something.

Failing that, you could always try dropping down into assembly to see if you can beat what the compiler generates.

I haven’t tried this just yet. I’m very close so I didn’t really want to do that just yet.

I don’t usually think about shift operators :sweat_smile: … not comfortable enough knowing how they work I suppose.

I just today started grabbing the whole tile in the outer loop :partying_face: so that helps things significantly.

Not quite sure what unrolling the inner tile loop would do, but I might give it a go (since it only needs to grab 16 colors and then move to the next tile.)

Here is what I am working with at the moment:

    void fillLine(ushort[] line, int y) {
        if (offsetY + y < 0 || offsetY + y >= 176) return;
        
        var tileIndexY = ((y+offsetY) / 16) * width;
        var modY = ((y+offsetY) % 16) * 16;
        
        int startX = offsetX;
        int endX = 220;
        
        if(offsetX < 0){
            startX = 0;
            endX += offsetX;
        }
        if(startX + endX > 220) {
            endX = 220;
        }
        
        var calcX = 0;
        int tileIdx;
        for (int x = startX; x < endX; x+=16) {
            calcX = x-offsetX;
            
            __inline_cpp__("
                // Get tile index
                tileIdx = ((uint8_t*)tileMap)[(calcX / 16) + tileIndexY];
                auto tile = ((uint8_t*)tileSet) + tileIdx * 256 + modY;
            ");
            
            for(int t = 0; t < 16; t++){
                __inline_cpp__("
                color = tile[(t+calcX) % 16];
                ");
                line[x+t] = palette[color];
            }
            
        }
    }

calcX is simply the current x value of the outer loop less the offsetX.
modY (yeah a terrible name) is the modified y location for which y line of the tile map we are accessing.

Edit:

For the time being I just yanked out all of the offsets (so no moving the map for the time being)

    void fillLine(ushort[] line, int y) {
        if (offsetY + y < 0 || offsetY + y >= 176) return;
        
        var tileIndexY = ((y+offsetY) / 16) * mapWidth;
        var modY = ((y+offsetY) % 16) * 16;
        int modX;
        
        int tileIdx;
        
        for (int x = 0; x < mapWidth; x++) {
            __inline_cpp__("
                // Get tile index
                tileIdx = ((uint8_t*)tileMap)[(x) + tileIndexY];
                auto tile = ((uint8_t*)tileSet) + tileIdx * 256 + modY;
            ");
            
            modX = x * 16;
            for(int t = 0; t < 16; t++){
                __inline_cpp__("
                color = tile[t];
                ");
                line[modX+t] = palette[color];
            }
            
        }
    }

23~24 fps is still a bit shy of my goal of 30 :frowning: but it is significantly closer than I was!

So… I’d like:

  • just a few more frames per second! :laughing:
  • Being able to move the map without it blowing up heh…
1 Like

I won’t go into detail about them now, but some useful (and relevant) rules to remember are:

  • x >> n is equivalent to x / 2n (and usually cheaper)
  • x << n is equivalent to x * 2n (and usually cheaper)
  • x & (1 << n) is equivalent to x % 2n (and usually cheaper)

Normally you shouldn’t have to worry about using shifts in place of division and multiplication by a constant because the compiler ought to work it out for you and replace the divisions and multiplications with shifts when appropriate (a type of optimisation called ‘strength reduction’), but it’s worth a try.

(If you ever want an in-depth breakdown of how the bitwise operations work, let me know. If you understand binary and boolean logic then they’re not as complicated as they might seem.)

  • Continually performing the same test can waste CPU cycles if there’s a cheaper or less frequent test that can be performed.
    • (A complete loop unroll basically eliminates the test altogether.)
  • If the CPU has instruction pipelining (and either no branch predictor or one that is failing the prediction) then unrolling the loop avoids the pipeline stall associated with branching/failed branch prediction, thus ensuring a steady stream of instructions.
    • (Basically it’s trading memory for speed.)

This one is easier to follow.

I don’t think it would make a difference, but I think it’s still worth keeping your variables as close to the point at which they’re first used to make sure they get the bare minimum scope that they need. (i.e. modX and tileIdx should be inside the for loop.

This arrangement makes the loop unrolling a lot easier than it would be with the earlier version:

void fillLine(ushort[] line, int y)
{
	var startY = (y + offsetY);
	if ((startY < 0) || (startY >= 176)) return;
	
	var tileIndexY = ((startY / 16) * mapWidth);
	var modY = ((startY % 16) * 16);
	
	for (int x = 0; x < mapWidth;)
	{
		__inline_cpp__("
			// Get tile index
			auto tileIdx = ((uint8_t*)tileMap)[(x) + tileIndexY];
			auto tile = ((uint8_t*)tileSet) + tileIdx * 256 + modY;
		");
		
		var modX = (x * 16);
		
		// Sixteen unrolled iterations
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		__inline_cpp__("
		color = *tile;
		++tile;
		");
		line[modX] = palette[color];
		++modX;
		
		// No need to increment on the 16th step
		__inline_cpp__("
		color = *tile;
		");
		line[modX] = palette[color];
	}
}

Give that a try to see if it makes a difference.

2 Likes

So trying a few of these things I didn’t really notice any difference just yet? But maybe just staring at the fps in the console isn’t enough heh…

With more help from the community we pulled in an ASM fillLine16 method so that more than doubled the fps already, now pushing about 34 on the 15 dogs and a full map.

    void fillLine(ushort[] line, int y) {
        if (cameraY + y < 0 || cameraY + y >= 176) return;
        
        var tileMapIndexY = ((y+cameraY) / 16) * mapWidth;
        var tileSetIndexY = ((y+cameraY) % 16) * 16;
        
        for (int x = 0; x < mapWidth; x++) {
            __inline_cpp__("
            // Get tile index
            auto tileId = ((uint8_t*)tileMap)[(x) + tileMapIndexY];
            auto tile = ((uint8_t*)tileSet) + tileId * 256 + tileSetIndexY;
            ");
            
            var lineX = (x * 16);

            for(int t = 0; t < 16; t++){
                __inline_cpp__("
                color = tile[t];
                ");
                line[lineX+t] = palette[color];
            }
        }
    }

I tried renaming the variables to be more helpful here.

Things missing still:

  • camera adjustments, in particular on the x axis, since y “just works” as it is. I don’t know how to refactor this to help accommodate the clipping and such.
2 Likes

Would it be easier to manage clipping and everything if I did the outer loop based on the pixel width of the screen instead of the tile width of the map?

I’m not entirely sure how to refactor this for that :thinking:

Used some comments to point out what is happening here:

    void fillLine(ushort[] line, int y) {
        // Skip the line if above/under the screen view
        if (cameraY + y < 0 || cameraY + y >= 176) return;
        
        // Set the Y for the map and tileset lookup
        var tileMapIndexY = ((y+cameraY) / 16) * mapWidth;
        var tileSetIndexY = ((y+cameraY) % 16) * 16;
        
        // Loop the map width to collect the tiles
        for (int x = 0; x < mapWidth; x++) {
            __inline_cpp__("
            // Get tile ID from the map. Then use that to find the tile itself from the tileset
            auto tileId = ((uint8_t*)tileMap)[(x) + tileMapIndexY];
            auto tile = ((uint8_t*)tileSet) + tileId * 256 + tileSetIndexY;
            ");
            
            // Set where to begin the X position in the line array
            var lineX = (x * 16);

            // Loop over the Tile color IDs and put them in the line array.
            for(int t = 0; t < 16; t++){
                __inline_cpp__("
                color = tile[t];
                ");
                line[lineX+t] = palette[color];
            }
        }
    }
1 Like

Sorry for making remarkably little progress on this last piece of the Tile Filler. I seem to have hit a wall and am struggling to just power through it :laughing:

2 Likes

Sometimes you need to move on to something else … and then, when you least expect it, the answer will come!

5 Likes

Well that definitely helped. After taking a step back and making sure I thoroughly understood what was really moving around and how, I now have it working nearly as expected!
It has a pretty significant performance penalty but I believe that can be ironed out eventually. As it sits I can still squeeze nicely 15 sprites on a nice sized map at just above 25fps. So definitely playable speeds there.

I’ll need to still introduce clipping but that’s not too bad now that I understand how the data is moving.

5 Likes

I am still struggling like mad to figure out how to clip. I know it is just a matter of getting the right math, but for some reason my brain goes silly putty the moment I get to the code :joy:
I’m trying to comment to make it more obvious what is going where, but I’m still kind of loss…

the Filler in case anyone is interested in seeing my current spaghetti :laughing:

Kind of weird too. I’m not quite certain how to make the map begin at the cameraX on the x coord.

If anyone is willing to help me untangle this mess I’d be ridiculously grateful.

 void fillLine(ushort[] line, int y) {
        // Clip top and bottom of map.
        if(y-cameraY < 0 || y-cameraY >= mapHeight*tileH)return;

        // Set the Y for the map and tileset lookup
        var tileMapIndexY = ((y-cameraY) / tileH) * mapWidth;
        var tileSetIndexY = ((y-cameraY) % tileH) * tileW;
        
        // Divide the current X position by the width of the Tiles.
        var mapX = (-cameraX / tileW) + tileMapIndexY;
        
        // Get the position on the first X Tile. 
        var tileX = -cameraX % tileW;
        
        // Loop the screen width to collect the tiles
        for (int i = 0; i < 220;) {
            
            int iter = Math.min(tileW - tileX, 220 - i);
            __inline_cpp__("
            // Get tile ID from the map. Then use that to find the tile itself from the tileset
            auto tileId = ((uint8_t*)tileMap)[mapX];
            auto tile = ((uint8_t*)tileSet) + tileId * 256 + tileSetIndexY;
            ");
            
            // Loop over the Tile color IDs and put them in the line array.
            for(int t = 0; t < iter; t++){
                __inline_cpp__("
                color = tile[tileX + t];
                ");
                line[i+t] = palette[color];
            }
            
            i+=iter;
            tileX = 0;
            mapX++;
        }
    }
1 Like


Getting close I think! Aside from that annoying column on the left side, I think this will be good enough to work with for now.

5 Likes
    void fillLine(ushort[] line, int y) {
        // Clip top and bottom of map.
        if(y-cameraY < 0 || y-cameraY >= mapHeight*tileH)return;
        
        // Set the Y for the map and tileset lookup
        var mapY = ((y-cameraY) / 16) * mapWidth;
        var tileY = ((y-cameraY) % 16) * tileW;
        
        // Divide the current X position by the width of the Tiles.
        var mapX = cameraX / tileW;
        
        // Get the position on the first X Tile.
        var tileX = cameraX % tileW;
        
        // Loop the map width to collect the tiles
        for (int i = 0; i < 220;) {
            // Clip the right hand side of the map. Whee~
            if(mapX >= mapWidth)return;
            
            int iter = Math.min(tileW - tileX, 220 - i);
            // Trying to clip the left side of the map... Because this is an int, we lose the precision to check for that first column between -1 and 0. 
            if(mapX < 0){
                mapX++;
                tileX = 0;
                i+=iter;
                continue;
            }
            
            __inline_cpp__("
            // Get tile ID from the map. Then use that to find the tile itself from the tileset
            auto tileId = ((uint8_t*)tileMap)[mapX + mapY];
            auto tile = ((uint8_t*)tileSet) + tileId * 256 + tileY;
            ");            

            // Loop over the Tile color IDs and put them in the line array.
            for(int t = 0; t < iter; t++){
                __inline_cpp__("
                color = tile[tileX + t];
                ");
                line[i+t] = palette[color];
            }
            i+=iter;
            tileX = 0;
            mapX++;
        }
    }
1 Like

I’m moving on from this section so I can keep making some sort of progress. I’ll of course come back around to try and improve stuff as possible unless someone more skilled than I decides to take pity on me :joy:

2 Likes