Problems with sprites

Maybe in the cases when the whole frame is copied to a memory location. Most of times it is copied to the screen which is done scanline by scanline anyway.

1 Like

I agree with @Pharap that frames should definitely NOT be interlaced with each other in the memory. You’re always drawing a single frame and that should be a continuous memory because of caching (even if there is no cache on Pokitto, the format should try to respect other platforms if possible, and respect the rule of least surprise, which means do what is common), and also for example copying a single frame out somewhere should be doable with just memcpy.

I think keeping the number of frames in the format in the 3 byte header is the way to go, because you mustn’t allow the function to segfault even with wrong parameters. And generally, the format should be such that it contains all the information about the data and supposes even generalized use – for example it can be stored in a file and loaded without any other information, in which case you need to have the number of frames stored.

100% agree.

Yes please, I find the lack of documentation is kind of an issue.

EDIT:

Or actually couldn’t there be a separate format for bitmaps with frames, like Sprite on arduboy? DrawBitmap would draw a bitmap (2 byte header) and DrawSprite would draw a frame of given sprite (3 byte header).

1 Like

I vote for this also.

I’d like a way to copy a region from an image to the screen. Right now this seems not possible to me. Also there’s no direct access to the memory screen buffer.

You mean Pokitto::Display::screenbuffer? It’s public, you can access that.

2 Likes

ops :flushed:
I was looking to

uint8_t* Display::m_scrbuf;

and got distracted. So I can copy region to /from the screen. Great!

1 Like

You don’t normally pass the number of frames though.
Normally you pass the frame you want to draw, but not the maximum number of frames.

What would the drawBitmap function do with it?

It can’t thrown an exception because those aren’t enabled.

Returning an error code means you have to store and check the error,
which isn’t something you want to be doing in drawing code.

You could modulo by it, but that’s wasteful when it isn’t needed and ends up just hiding bugs.

Exactly, that’s why it’s inefficient. You have to copy width bytes and then skip width * frameIndex bytes:

It’s not just a straightforward block copy anymore, it’s interlaced.

The reason it should be contiguous isn’t caching, it’s for simplicity and speed of iteration.
If the frame data is interlaced then the logic for accessing the frames becomes slower and more complex.

memcpy often performs worse than a for loop when using modern compilers.

This is also a good point. One of the reasons I’ve had trouble with framew is because it wasn’t obvious what its intended purpose was because it’s very unusual.

It’s better to stick to what’s tried and tested - width and height.

Segfaults are a good thing, they tell you that you’re doing something wrong.
If you just ignore the frameIndex being out of bounds thenyou violate the fail fast principle.
Going out of bounds is a bug in the calling code, and bugs should be big and obvious, not silenced.

1 Like

Can you elaborate more?
(yes I know I’m telling this to @Pharap, please forgive me all :grin:)

Not really, you just increment rows by frameWidth * frames instead of one, that’s it. Yes, it’s a tiny bit simpler, but it’s mostly for the cache.

I’m not saying don’t fail quickly! Indicate error without crashing the program.

Not sure this can be applied to Pokitto processor, is there cache on LPC11U68?

No, but as I said, the format shouldn’t suppose this, the games can be ported to platforms with other processors, emulated and so on. And vice versa converting to Pokitto won’t require reordering pixels.

I think this is a good reason to save frames in vertical layout.

memcpy takes void * arguments and gets passed a block size.

The most optimum way to copy memory on most systems is to copy a full register width of memory, so for sizes that aren’t a multiple of that full regsiter width, there’s some extra copying that needs doing (i.e. blockSize % registerWidth).

The compiler can calculate that at compile time and then hardcode the copying, but memcpy has to include those checks and do them at runtime because it doesn’t know what size it’s going to be passed.

(And for the record I tested this once on the Arduboy.)

Hrm, I suppose so. But it’s still not ideal.
Don’t forge that incrementing can take advantage of post-increment addressing modes.
Both Thumb and AVR feature post-increment addressing.
(ARM features pre-increment addressing, but Thumb doesn’t.)
(See IA here.)

But how though?
If you return an error code, you’re then asking drawing code to read that error and handle it.
The error code and checking for the error code will add overhead, and the correct way to handle that error is to rewrite the program to prevent that error from occuring in the first place - it’s not something you should be handling at runtime.

If a CPU has a cache, it’s probably going to be more powerful than the Pokitto’s MCU anyway, so I’m not sure this will make much difference.

1 Like

But the code calling the drawBitmap function must be explicitely told the max number of frames. It is needed for animating. If the animation is refactored you have to change some constant in the code elsewhere instead of just changing the graphics file header. I also find the header a little incomplete if there is the frame size information but not the total size of the memory or the frame count.

Pokitto Crash Screen (PCS) is your friend :slight_smile:

1 Like

What I’m saying is let’s be consistent – computers with cache do it one way, so if we have a choice, let’s do it the same so that we keep the compatibility. I am defending your argument of storing the frames in the memory the way you suggested, we shouldn’t be disagreeing on this :smiley:

Definitely this. BUT it comes down to this:

Do we understand it as an animated bitmap, or an array of static bitmaps?

If the former, there should be a 3 byte header with the number of frames and the function should take a frame as a parameter. And it should be called something like drawAnimatedBitmap. (as @Hanski says, the format should contain all the info about itself to avoid constants, spreading the data all over the code, and also being generally able to retrieve the number of frames by e.g. a completely separate class/module, storing the format in a file and so forth and so on).

If the latter, there should be a 2 byte header and the function shouldn’t take any frame as a parameter. And it should be called drawBitmap.

And then, why not both? Let’s have two function, no?

Yes, but if you store that size in the image array then to retrieve it the programmer has to do an offset into the array, which is going to look very odd.

If it’s stored separately as a constexpr variable then it can be used as an immediate operand to an opcode instead of being fixed in memory and requiring a pointer dereference to fetch.

That still seems like a bit of a heavy-handed thing to do in a drawing function.

We’re arguing for the same thing but for slightly different reasons.
I’m not worried about porting Pokitto games to other systems because I think platform exclusives are part of the selling point of any console.
If all consoles played the same games then it would be much of a muchness.

Plus I’m saying that if a CPU has a cache, it’s probably powerful enough to emulate the Pokitto without worry anyway, so caches shouldn’t really factor into this even with the arguments about porting.
In contrast, a lot of low-power MCUs have post-increment addressing (even as far back as the MOS6502), so that’s a much stronger argument.
(Not to mention the historical precendent set by scanlines.)

Animation is nothing more than displaying of several different static images in sequence, so these two things are pretty much equivalent.

I think drawBitmapFrame would be more descriptive because the function would only be drawing a single frame.
drawAnimatedX implies there’s enough information to actually handle the animation, but there wouldn’t be because that would be a stateful operation.

Also this kind of overload is often used for implementing tile engines, with the tile id being the frameIndex argument.

But like I said above, this could be the difference between the constant existing in memory and having to be pulled out of RAM when it’s needed or being compiled as an immediate to an opcode.

It would just be a function.

// Safe at compile time
template<std::size_t size>
std::size_t getFrameCount(const std::uint8_t (&bitmap)[size]);

// Dangerous, could segfault
std::size_t getFrameCount(const std::uint8_t bitmap *);

That said, I have toyed with ideas like:

template<std::size_t Width, std::size_t Height, std::size_t FrameCount>
struct Image
{
    constexpr std::size_t size = Width * Height;
    // I'd like to point out that these three are redundant 
    // because the values are part of the type
    std::uint8_t frameCount = FrameCount;
    std::uint8_t width = Width;
    std::uint8_t height = Height;
    std::uint8_t data[size];
    
    // For easy access of bytes
    std::uint8_t operator[](std::size_t index) const
    {
        return this->data[index];
    }
};

But the problems would be ensuring compile-time construction and of course I know everyone still bemoans templates.

That’s a bit more of a compelling argument, but even so the frameCount could be stripped out upon load without causing issues.

This was going to be my next suggestion. I could live with this.

We coudl have one function that just expected a 2-byte width and height header, and a different function that expected a 3-byte frameCount, width and height header.

The former could be drawBitmapFrame, drawBitmapFrameRaw or drawBitmapFrameUnsafe to signify that it won’t do bounds checking,

The latter could be drawBitmapFrame or drawBitmapFrameSafe to indicate that it does bounds checking.

The ‘safe’ version could do its checking and then call the ‘unsafe’ version, which would reduce the memory footprint and avoid repetition.

(The C++ stdlib does something similar for collections: the index operator doesn’t do bounds checking, the at function does.)

Bitmap has no frames. Think of the formats as classes: you have a Bitmap – static image, width and height, with method draw() that takes no frame number – and you have Animation – dynamic image, width and height, number of frames (which the class should know) and a draw method that takes a frame argument. We can’t mix these together. So there should be two formats with two names and two drawing functions.

Could you elaborate this? Does the same oddity concern the width and the height parameters?

Why is that? I mean that when a frame number is too big it would show PCS. Is there a neater way to tell information about the error to the user?

You gain nothing by saving a dereference – you have a number of pixels to fetch from the memory in order of hundreds, one saved fetch won’t achieve anything. This is an example of optimization in a non-critical part of the program. The other advantages outweight this by a great amount.

If you put the frame count into the array then it absolutely has to live in flash.

If you make it a constexpr variable (or possibly something like const static if C++03 is still a must, though I’m not sure how the mechanics work for that) then it’s free to become an immediate operand in a CPU instruction.

For example if you had the hypothetical function:

template<std::size_t size>
ErrorCode drawBitmapFrameSafe(int x, int y, const char (&bitmap)[size], std::size_t frameIndex, std::size_t frameCount)
{
	if(frame >= frameCount)
		return ErrorCode::InvalidFrameIndex;
	
	// Forward to function that checks the size,
	// which forwards to the 'unsafe' version
	return drawBitmapFrameSafe(x, y, bitmap, frameIndex);
}

And then it’s called like:

drawBitmapFrameSafe(x, y, someBitmap, 1, someBitmapFrameCount);

And then it gets inlined, producing code equivalent to this:

ErrorCode result;
if(1 >= someBitmapFrameCount)
	result = ErrorCode::InvalidFrameIndex;
else
	result = drawBitmapFrameSafe(x, y, someBitmap, 1);

And then the compiler realises that someBitmapFrameCount is a constexpr so (1 >= someBitmapFrameCount) is also a constant expression, allowing the compiler to reduce the code to either:

ErrorCode result = drawBitmapFrameSafe(x, y, someBitmap, 1);

Or

ErrorCode result = ErrorCode::InvalidFrameIndex;

However, if you instead called:

drawBitmapFrameSafe(x, y, someBitmap, animationIndex, someBitmapFrameCount);

You’d get:

ErrorCode result;
if(animationIndex >= someBitmapFrameCount)
	result = ErrorCode::InvalidFrameIndex;
else
	result = drawBitmapFrameSafe(x, y, someBitmap, 1);

And providing someBitmapFrameCount could be represented as an 8-bit value, that comparison (>=) would compile to the CMP Rd, #Offset8 Thumb instruction (combined with a conditional jump).

Hrm, maybe heave handed isn’t quite the right word.
I mean it’s the sort of thing you’d want to do during debugging, but not the sort of thing you’d want going on in production code.

Function calls are cheap, but it’s the extra comparison and branch instructions that soon mount up.
For example, if you drew an 8x6 grid of tiles using that function, that’s 48 extra comparisons and branches, so (I think) roughly 72 extra cycles.
.
If you had the option to ignore the safety check because you could guarantee that the index being passed was valid then you wouldn’t have to do any of those comparisons.

A well-written program would never pass an invalid index, and if the error message appeared then there’s no way the program can respond to the error - the programmer has to rewrite their code to ensure that they aren’t passing an invalid index.

It would be alright as a means of debugging, but that’s the sort of thing you might want to leave out of the release version because leaving it out would save a bit of speed.

It’s the same reason you wouldn’t want to implement drawRect by calling setPixel - because setPixel has the overhead of checking if the pixel is on screen, and drawRect can already do all the neccessary checking in a much cheaper way.

It’s not one saved fetch though, it’s one per image.
If you’re drawing lots of sprites it can make a difference.

What are the advantages other than an extra logic heck?

Like I say, if the error happens then your program is written badly and you have to rewrite it to remove the bug.
If you write the logic properly then you never pass an invalid index, at which point the check is just a waste of cycles.

That’s why I think having a ‘safe’ version and an ‘unsafe’ version is a good solution, but the unsafe version wouldn’t need the frame count because it wouldn’t use it.

Having an unsafe version would also allow the programmer to decide how they go about avoiding an invalid index.
They can choose to use clamping or wrapping etc.
(Yes that could still be done in the safe version, but you’d end up with the unnecessary check.)

Even per image the gain is so miniscule it will literally be unmeasurable unless you have some expensive extreme high FPS camera. Take all the things that happen during a frame and in the loops that are drawing all the pixels – even if you draw a 100 images, which is extremely many, and let’s say you save one cycle on each – that’s 100 out of 48 million per second, or 0.000002 of a second. This is totally negligible. (The same goes for saving 1 byte of memory. Maybe on Arduboy we could talk about single bytes, but here I don’t think so.)

If we had a choice of two versions of which one is negligibly faster, then we’d go for it just because, but here the other option is much better:

  • safe: if you manage the number of frames yourself, you can screw it up, whereas if we get it right once in the framework, everyone will have it correct
  • supports saving to files
  • backward compatible
  • self-contained, data of the animation are kept in one place
  • supports access by other entities that do not know the number of frames in advance
  • still allows to ignore the value in your custom functions

Also you could argue the same about the width and height being stored in the data – it doesn’t need to be there, you could remember it in your program.