Maybe in the cases when the whole frame is copied to a memory location. Most of times it is copied to the screen which is done scanline by scanline anyway.
I agree with @Pharap that frames should definitely NOT be interlaced with each other in the memory. Youâre always drawing a single frame and that should be a continuous memory because of caching (even if there is no cache on Pokitto, the format should try to respect other platforms if possible, and respect the rule of least surprise, which means do what is common), and also for example copying a single frame out somewhere should be doable with just memcpy.
I think keeping the number of frames in the format in the 3 byte header is the way to go, because you mustnât allow the function to segfault even with wrong parameters. And generally, the format should be such that it contains all the information about the data and supposes even generalized use â for example it can be stored in a file and loaded without any other information, in which case you need to have the number of frames stored.
100% agree.
Yes please, I find the lack of documentation is kind of an issue.
EDIT:
Or actually couldnât there be a separate format for bitmaps with frames, like Sprite on arduboy? DrawBitmap would draw a bitmap (2 byte header) and DrawSprite would draw a frame of given sprite (3 byte header).
I vote for this also.
Iâd like a way to copy a region from an image to the screen. Right now this seems not possible to me. Also thereâs no direct access to the memory screen buffer.
You mean Pokitto::Display::screenbuffer
? Itâs public, you can access that.
ops
I was looking to
uint8_t* Display::m_scrbuf;
and got distracted. So I can copy region to /from the screen. Great!
You donât normally pass the number of frames though.
Normally you pass the frame you want to draw, but not the maximum number of frames.
What would the drawBitmap
function do with it?
It canât thrown an exception because those arenât enabled.
Returning an error code means you have to store and check the error,
which isnât something you want to be doing in drawing code.
You could modulo by it, but thatâs wasteful when it isnât needed and ends up just hiding bugs.
Exactly, thatâs why itâs inefficient. You have to copy width
bytes and then skip width * frameIndex
bytes:
Itâs not just a straightforward block copy anymore, itâs interlaced.
The reason it should be contiguous isnât caching, itâs for simplicity and speed of iteration.
If the frame data is interlaced then the logic for accessing the frames becomes slower and more complex.
memcpy
often performs worse than a for
loop when using modern compilers.
This is also a good point. One of the reasons Iâve had trouble with framew
is because it wasnât obvious what its intended purpose was because itâs very unusual.
Itâs better to stick to whatâs tried and tested - width and height.
Segfaults are a good thing, they tell you that youâre doing something wrong.
If you just ignore the frameIndex
being out of bounds thenyou violate the fail fast principle.
Going out of bounds is a bug in the calling code, and bugs should be big and obvious, not silenced.
Not really, you just increment rows by frameWidth * frames
instead of one, thatâs it. Yes, itâs a tiny bit simpler, but itâs mostly for the cache.
Iâm not saying donât fail quickly! Indicate error without crashing the program.
Not sure this can be applied to Pokitto processor, is there cache on LPC11U68?
No, but as I said, the format shouldnât suppose this, the games can be ported to platforms with other processors, emulated and so on. And vice versa converting to Pokitto wonât require reordering pixels.
I think this is a good reason to save frames in vertical layout.
memcpy
takes void *
arguments and gets passed a block size.
The most optimum way to copy memory on most systems is to copy a full register width of memory, so for sizes that arenât a multiple of that full regsiter width, thereâs some extra copying that needs doing (i.e. blockSize % registerWidth
).
The compiler can calculate that at compile time and then hardcode the copying, but memcpy
has to include those checks and do them at runtime because it doesnât know what size itâs going to be passed.
(And for the record I tested this once on the Arduboy.)
Hrm, I suppose so. But itâs still not ideal.
Donât forge that incrementing can take advantage of post-increment addressing modes.
Both Thumb and AVR feature post-increment addressing.
(ARM features pre-increment addressing, but Thumb doesnât.)
(See IA here.)
But how though?
If you return an error code, youâre then asking drawing code to read that error and handle it.
The error code and checking for the error code will add overhead, and the correct way to handle that error is to rewrite the program to prevent that error from occuring in the first place - itâs not something you should be handling at runtime.
If a CPU has a cache, itâs probably going to be more powerful than the Pokittoâs MCU anyway, so Iâm not sure this will make much difference.
But the code calling the drawBitmap function must be explicitely told the max number of frames. It is needed for animating. If the animation is refactored you have to change some constant in the code elsewhere instead of just changing the graphics file header. I also find the header a little incomplete if there is the frame size information but not the total size of the memory or the frame count.
Pokitto Crash Screen (PCS) is your friend
What Iâm saying is letâs be consistent â computers with cache do it one way, so if we have a choice, letâs do it the same so that we keep the compatibility. I am defending your argument of storing the frames in the memory the way you suggested, we shouldnât be disagreeing on this
Definitely this. BUT it comes down to this:
Do we understand it as an animated bitmap, or an array of static bitmaps?
If the former, there should be a 3 byte header with the number of frames and the function should take a frame as a parameter. And it should be called something like drawAnimatedBitmap
. (as @Hanski says, the format should contain all the info about itself to avoid constants, spreading the data all over the code, and also being generally able to retrieve the number of frames by e.g. a completely separate class/module, storing the format in a file and so forth and so on).
If the latter, there should be a 2 byte header and the function shouldnât take any frame as a parameter. And it should be called drawBitmap
.
And then, why not both? Letâs have two function, no?
Yes, but if you store that size in the image array then to retrieve it the programmer has to do an offset into the array, which is going to look very odd.
If itâs stored separately as a constexpr
variable then it can be used as an immediate operand to an opcode instead of being fixed in memory and requiring a pointer dereference to fetch.
That still seems like a bit of a heavy-handed thing to do in a drawing function.
Weâre arguing for the same thing but for slightly different reasons.
Iâm not worried about porting Pokitto games to other systems because I think platform exclusives are part of the selling point of any console.
If all consoles played the same games then it would be much of a muchness.
Plus Iâm saying that if a CPU has a cache, itâs probably powerful enough to emulate the Pokitto without worry anyway, so caches shouldnât really factor into this even with the arguments about porting.
In contrast, a lot of low-power MCUs have post-increment addressing (even as far back as the MOS6502), so thatâs a much stronger argument.
(Not to mention the historical precendent set by scanlines.)
Animation is nothing more than displaying of several different static images in sequence, so these two things are pretty much equivalent.
I think drawBitmapFrame
would be more descriptive because the function would only be drawing a single frame.
drawAnimatedX
implies thereâs enough information to actually handle the animation, but there wouldnât be because that would be a stateful operation.
Also this kind of overload is often used for implementing tile engines, with the tile id being the frameIndex
argument.
But like I said above, this could be the difference between the constant existing in memory and having to be pulled out of RAM when itâs needed or being compiled as an immediate to an opcode.
It would just be a function.
// Safe at compile time
template<std::size_t size>
std::size_t getFrameCount(const std::uint8_t (&bitmap)[size]);
// Dangerous, could segfault
std::size_t getFrameCount(const std::uint8_t bitmap *);
That said, I have toyed with ideas like:
template<std::size_t Width, std::size_t Height, std::size_t FrameCount>
struct Image
{
constexpr std::size_t size = Width * Height;
// I'd like to point out that these three are redundant
// because the values are part of the type
std::uint8_t frameCount = FrameCount;
std::uint8_t width = Width;
std::uint8_t height = Height;
std::uint8_t data[size];
// For easy access of bytes
std::uint8_t operator[](std::size_t index) const
{
return this->data[index];
}
};
But the problems would be ensuring compile-time construction and of course I know everyone still bemoans templates.
Thatâs a bit more of a compelling argument, but even so the frameCount
could be stripped out upon load without causing issues.
This was going to be my next suggestion. I could live with this.
We coudl have one function that just expected a 2-byte width and height header, and a different function that expected a 3-byte frameCount, width and height header.
The former could be drawBitmapFrame,
drawBitmapFrameRaw
or drawBitmapFrameUnsafe
to signify that it wonât do bounds checking,
The latter could be drawBitmapFrame
or drawBitmapFrameSafe
to indicate that it does bounds checking.
The âsafeâ version could do its checking and then call the âunsafeâ version, which would reduce the memory footprint and avoid repetition.
(The C++ stdlib does something similar for collections: the index operator doesnât do bounds checking, the at
function does.)
Bitmap has no frames. Think of the formats as classes: you have a Bitmap â static image, width and height, with method draw() that takes no frame number â and you have Animation â dynamic image, width and height, number of frames (which the class should know) and a draw method that takes a frame argument. We canât mix these together. So there should be two formats with two names and two drawing functions.
Could you elaborate this? Does the same oddity concern the width and the height parameters?
Why is that? I mean that when a frame number is too big it would show PCS. Is there a neater way to tell information about the error to the user?
You gain nothing by saving a dereference â you have a number of pixels to fetch from the memory in order of hundreds, one saved fetch wonât achieve anything. This is an example of optimization in a non-critical part of the program. The other advantages outweight this by a great amount.
If you put the frame count into the array then it absolutely has to live in flash.
If you make it a constexpr
variable (or possibly something like const static
if C++03 is still a must, though Iâm not sure how the mechanics work for that) then itâs free to become an immediate operand in a CPU instruction.
For example if you had the hypothetical function:
template<std::size_t size>
ErrorCode drawBitmapFrameSafe(int x, int y, const char (&bitmap)[size], std::size_t frameIndex, std::size_t frameCount)
{
if(frame >= frameCount)
return ErrorCode::InvalidFrameIndex;
// Forward to function that checks the size,
// which forwards to the 'unsafe' version
return drawBitmapFrameSafe(x, y, bitmap, frameIndex);
}
And then itâs called like:
drawBitmapFrameSafe(x, y, someBitmap, 1, someBitmapFrameCount);
And then it gets inlined, producing code equivalent to this:
ErrorCode result;
if(1 >= someBitmapFrameCount)
result = ErrorCode::InvalidFrameIndex;
else
result = drawBitmapFrameSafe(x, y, someBitmap, 1);
And then the compiler realises that someBitmapFrameCount
is a constexpr
so (1 >= someBitmapFrameCount)
is also a constant expression, allowing the compiler to reduce the code to either:
ErrorCode result = drawBitmapFrameSafe(x, y, someBitmap, 1);
Or
ErrorCode result = ErrorCode::InvalidFrameIndex;
However, if you instead called:
drawBitmapFrameSafe(x, y, someBitmap, animationIndex, someBitmapFrameCount);
Youâd get:
ErrorCode result;
if(animationIndex >= someBitmapFrameCount)
result = ErrorCode::InvalidFrameIndex;
else
result = drawBitmapFrameSafe(x, y, someBitmap, 1);
And providing someBitmapFrameCount
could be represented as an 8-bit value, that comparison (>=
) would compile to the CMP Rd, #Offset8
Thumb instruction (combined with a conditional jump).
Hrm, maybe heave handed isnât quite the right word.
I mean itâs the sort of thing youâd want to do during debugging, but not the sort of thing youâd want going on in production code.
Function calls are cheap, but itâs the extra comparison and branch instructions that soon mount up.
For example, if you drew an 8x6 grid of tiles using that function, thatâs 48 extra comparisons and branches, so (I think) roughly 72 extra cycles.
.
If you had the option to ignore the safety check because you could guarantee that the index being passed was valid then you wouldnât have to do any of those comparisons.
A well-written program would never pass an invalid index, and if the error message appeared then thereâs no way the program can respond to the error - the programmer has to rewrite their code to ensure that they arenât passing an invalid index.
It would be alright as a means of debugging, but thatâs the sort of thing you might want to leave out of the release version because leaving it out would save a bit of speed.
Itâs the same reason you wouldnât want to implement drawRect
by calling setPixel
- because setPixel
has the overhead of checking if the pixel is on screen, and drawRect
can already do all the neccessary checking in a much cheaper way.
Itâs not one saved fetch though, itâs one per image.
If youâre drawing lots of sprites it can make a difference.
What are the advantages other than an extra logic heck?
Like I say, if the error happens then your program is written badly and you have to rewrite it to remove the bug.
If you write the logic properly then you never pass an invalid index, at which point the check is just a waste of cycles.
Thatâs why I think having a âsafeâ version and an âunsafeâ version is a good solution, but the unsafe version wouldnât need the frame count because it wouldnât use it.
Having an unsafe version would also allow the programmer to decide how they go about avoiding an invalid index.
They can choose to use clamping or wrapping etc.
(Yes that could still be done in the safe version, but youâd end up with the unnecessary check.)
Even per image the gain is so miniscule it will literally be unmeasurable unless you have some expensive extreme high FPS camera. Take all the things that happen during a frame and in the loops that are drawing all the pixels â even if you draw a 100 images, which is extremely many, and letâs say you save one cycle on each â thatâs 100 out of 48 million per second, or 0.000002 of a second. This is totally negligible. (The same goes for saving 1 byte of memory. Maybe on Arduboy we could talk about single bytes, but here I donât think so.)
If we had a choice of two versions of which one is negligibly faster, then weâd go for it just because, but here the other option is much better:
- safe: if you manage the number of frames yourself, you can screw it up, whereas if we get it right once in the framework, everyone will have it correct
- supports saving to files
- backward compatible
- self-contained, data of the animation are kept in one place
- supports access by other entities that do not know the number of frames in advance
- still allows to ignore the value in your custom functions
Also you could argue the same about the width and height being stored in the data â it doesnât need to be there, you could remember it in your program.