Playing music

open-source

#62

I don’t have any of the examples, I’m using PlatformIO.
Part of the reason I use it is so I don’t have to download the ever-growing examples folder. :P

I’ve found the problem though.
It seems the Pokitto_settings.h in PokittoIO isn’t up to date with the Pokitto_settings.h in PokittoLib.

If I do that then I won’t be comparing like for like so I’m more likely to break something.
I’ll do it if I must though.


#63

Quick question:
Out of all these things that are int,
how many (and which ones) actually need to be able to hold negative numbers?


#64

I think only the stuff in the audio interrupt. (not 100% sure though).


#65

You mean update_tune? If so, which variables?


I’m still part way though, but I’d like to point something out while I remember:

for(uint8_t t = 0; t < strlen(text); t++){
	uint8_t character = text[t];
	drawBitmap(x,y,largefont[character-32], 2);
	x += letter_widths[character-32];
}

This will call strlen once for each loop iteration, and strlen is an O(n) function, which basically means it’s doing a loop itself.
strlen is actually implemented by a loop that iterates through every character in the string until it finds '\0'.

So basically you need to be caching the length:

const std::size_t textLength = strlen(text);
for(std::size_t index = 0; index < textLength; ++index)
{
	unsigned char character = static_cast<unsigned char>(text[index]);
	drawBitmap(x, y, largefont[character - 32], 2);
	x += letter_widths[character - 32];
}

#66

nope, in HWSound.cpp. Mixing the pcm samples is the only part that needs negative numbers (i think).


#67

I’m only working on the files from the musicplayer repo,
so at the moment I only care about the values in those.


Two more quick questions:

It seems that you’re only using 11 or 16 of the note_speed table entries?

Is this:

for(int s = 1; s < octave; ++s) {
	speed *= 2;
}

Supposed to be:

int speed = static_cast<int>(std::pow(note_speed[note - 1], octave));

(Although I think maybe speed = note_speed[note - 1] << octave; might have the same effect?
I didn’t bother to check the maths.)


#68

yes, the whole table is a hangover from an earlier version. So only the first 12? values are needed, just the 1 octave.

Looks right.


#69

Ok, that makes sense.

I’ll check later if it isn’t.


This probably looks quite drastic because I changed everything at once instead of breaking it up a bit, but see what you think:

I avoided templates as much as possible despite the fact they’d add flexibility (e.g. allowing people to choose 1-3 channel sound instead of 4 channel sound if they don’t need all 4 channels).
Only the Sound constructors use them.

A brief overview of the changes:

  • Added Note enum for easy note identification
  • Added EffectType enum for easy effect identification
    • I wasn’t sure what 0x0F and 0x0D were supposed to represent, so I had to guess names for them
    • In the sound data there were some mystery 4 and 6 type effects, so I just replaced with a cast for now
  • An effect-value pair is represented with a SoundEffect struct
  • The last part of a tune is represented by TunePart (which also correctly stores and retrieves a note and an octave in a single uint8_t)
    • TunePart has various constructors for flexible data representation
  • A Sound class which represents all the metadata of a sound stream
    • The actual data has to be in a separate array pointed to by the data member
      • This allows the same array to be reused for two different sounds
    • The templated constructors will auto-detect the size of an array passed into the constructors, so no need to keep track of array sizes anymore
  • A Channel class to represent a sound channel
    • Each channel is slightly larger because it copies a sound instead of pointing to a sound
    • I suspect volume, speed and repeat should be either uint32_t or uint16_t, but I don’t entirely know how they’re used so I couldn’t infer their restrictions
  • A SoundPlayer class that contains 4 channels and hold the playSound function (but no ticker etc)
  • A TunePlayer class that contains a SoundPlayer, a Ticker and the other state variables that were previously loose in main.cpp
    • Has play(frequency), stop(), enable() and disable()

The actual sound metadata is now reduced to:

constexpr std::uint16_t SampleRate8272Hz = 802;

const Sound sounds[]
{
	{ sound01Data, SampleRate8272Hz, 4956, 9977 },
	{ sound02Data, SampleRate8272Hz },
	{ sound03Data, SampleRate8272Hz },
	{ sound04Data, SampleRate8272Hz },
	{ sound05Data, SampleRate8272Hz },
	{ sound06Data, SampleRate8272Hz },
	{ sound07Data, SampleRate8272Hz },
	{ sound08Data, SampleRate8272Hz },
	{ sound09Data, SampleRate8272Hz },
	{ sound10Data, SampleRate8272Hz, 1140, 1203 },
	{ sound11Data, SampleRate8272Hz, 1616, 3441 },
	{ sound12Data, SampleRate8272Hz },
	{ sound13Data, SampleRate8272Hz },
	{ sound14Data, SampleRate8272Hz, 346, 909 },
	{ sound15Data, SampleRate8272Hz },
	{ sound16Data, SampleRate8272Hz, 614, 793 },
	{ sound17Data, SampleRate8272Hz },
	{ sound18Data, SampleRate8272Hz },
	{ sound19Data, SampleRate8272Hz },
	{ sound20Data, SampleRate8272Hz },
	{ sound21Data, SampleRate8272Hz, 1072, 6319 },
	{ sound22Data, SampleRate8272Hz },
	{ sound23Data, SampleRate8272Hz, 614, 793 },
	{ sound24Data, SampleRate8272Hz, 672, 3939 },
	{ sound25Data, SampleRate8272Hz, 0, 7171 },
	{ sound26Data, SampleRate8272Hz, 0, 6195 },
	{ sound27Data, SampleRate8272Hz, 0, 6121 },
};

Which makes it much clearer which sounds loop and implicitly figures out the size of the data array, which means there’s less chance of error.

Ideally there should be some error detection for the looping, but the only way to do that at compile time would mean more templates.
(E.g. something like { sound10Data, SampleRate8272Hz, SoundLoop<1140, 1203>{} },.)

There’s a way to do it at runtime, but that’s less useful and the Pokitto would just crash the moment it booted up, before the menu even came up.

And just to highlight the benefit of using array lookup instead of really long switch statements:

const Note note = part.getNote();
const std::uint8_t octave = part.getOctave();

if(note != Note::None)
{
	std::uint32_t speed = (getNoteSpeed(note) << octave);

	if((instrument >= 1) && (instrument <= 27))
	{
		const Sound & sound = sounds[instrument];
		soundPlayer.playSound(channelIndex, sound, volume, speed, offset);
	}
}

Remember anything you don’t understand, just ask about it.
I guarantee it’s not as scary as it looks.


#70

I’ll have to look at it on my computer so I can read it clearer, on my tiny phone screen it seems a bit complicated. :thinking:


#71

Wow, you put a lot of work into that.
But I fail to understand why this -

const TunePart tune[50][64][4]
{
	{
		{
			{ Note::C, 5, 192, 2, { EffectType::SetVolume, 1 } },
			{ EffectType::Mystery, 3 },
			{ EffectType::SetVolume, 0 },
			{ EffectType::SetVolume, 0 },
		},
		{
			{ EffectType::SetVolume, 2 },
			{ EffectType::SetVolume, 0 },
			{},
			{},
		},
		{
			{ EffectType::SetVolume, 3 },
			{},
			{},
			{},
		},

Is better than this -

const uint8_t tune[50][64][4][5]={
{
// Pattern:0
//      .----------------------[Note] Upper nibble is the note value.
//      |.---------------------[Octave] Lower nibble is the octave.
//      ||   .-----------------[Volume + 128] If first bit is set then volume is used.
//      ||   |    .------------[Instrument] Can use up to 255 instruments I suppose.
//      ||   |    |    .-------[Effect] Effects are based on standard MOD effects.
//      ||   |    |    |    .--[Effect Value] Values for above effects.
//      ||   |    |    |    |
    {{0x15,0xC0,0x02,0x0C,0x01},{0x00,0x00,0x00,0x0F,0x03},{0x00,0x00,0x00,0x0C,0x00},{0x00,0x00,0x00,0x0C,0x00}},
    {{0x00,0x00,0x00,0x0C,0x02},{0x00,0x00,0x00,0x0C,0x00},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00}},
    {{0x00,0x00,0x00,0x0C,0x03},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00}},
    {{0x00,0x00,0x00,0x0C,0x04},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00}},
    {{0x15,0xC0,0x03,0x0C,0x05},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00,0x00}},

The raw data, doesn’t really need to be dumbed down so much. my format is both readable, and reflects the format well enough.

or why removing the sound header data from the sample array is better? The main idea with the sample data was to act as much like a genuine file as possible, much like gfx should be.


#72

Because:

  • It’s easier to write a tune by hand if need be
    • (E.g. if you just need some quick sound data to test with)
  • It’s easier to adjust the data by hand
  • It’s easier to spot any bugs/mistakes in the data
  • It’s not that much harder to generate
    • It took me all of 10 minutes to write a C# program that would correctly convert the data from its old integer-based format into the new format
  • It’s easier to modify if the format needs to change for whatever reason
  • It’s easier to understand what the music is going to sound like just by looking at the data

Because the only way to incorporate the array into the struct would mean using a template, and the brief was to avoid templates.

Hopefully it’s clear that having a struct for a header allows you to simply do header.length, header.repeatStart etc instead of having to manually read the data out a byte at a time.

In fact, if it weren’t for those extra adjustments done afterwards before playing the music, you would be able to just use a const Sound * or const Sound & to refer directly to the sound object.

And also, the benefit I mentioned before of being able to have two sound objects that point to the same data but loop at different parts.

That’s a false premise.
Nowhere does it say that sound effects or graphics should look like a file format.

In general, the contents of a file format look almost nothing like how the sound/image looks when it’s been read out of the file and converted to an object usable by code.

File formats almost always have extra constraints, like storing sound/images in a compressed format, having lots of extra metadata, having a rigid structure etc.
Normally if a sound or image is being read from a file you have to navigate through the file’s metadata to identify the location of the sound/image and then decompress that data, and then translate it into the correct format for the frame buffer currently in use by your game.

With a ROM system you don’t have to do that because you can simply use a pointer to refer to a ROM address, which you can know at compile time.

With a file system, even if you stored your sound/image data completely RAW in an individually named file, you’d still have to load it into a buffer in RAM before you could use it, you couldn’t just point a pointer at it.

Besides which, aside from moving the metadata, the underlying representation hasn’t changed much.
If you were to reinterpret_cast<const unsigned char *>(tune) or reinterpret_cast<const unsigned char *>(sound01Data) then you should end up with exactly the same bytes as before (minus the header).

(I’m not 100% sure if that’s guaranteed by the standard, but I know the vast majority of compilers store object fields in the order they’re declared, not accounting for alignment and padding.)


#73

I’m not sure that situation would ever arise. The idea would be to use a purpose made sequencer.

Again, why would someone adjust the data by hand? People in general don’t adjust data by hand if it was created by a tool. I for example have never opened up a .bmp in a hex editor and changed pixels by hand.

There wouldn’t be any once the tools are finished. The end user wouldn’t need to worry about that.

It might not be harder to generate, but i’t harder (in my opinion) to read. Much the same way as when people use .xml format to store settings when plain .ini format is capable enough of storing the info.

I think again, that’s a matter of opinion. I’m far more comfortable working with byte arrays directly.

I’'m not sure why that would be important. After all, music is for listening to rather than looking at. :expressionless:


#74

Because if the tweak is small enough it’s often quicker to do it by hand than waiting for an editor to load up.

I’ve done it regularly with levels in various games when I can’t be bothered to dig out the level editor.

I haven’t edited many .bmp files (mainly because I hardly ever use .bmp), but I have edited (and even created) plenty of Arduboy images (in array form) by hand,
which is pretty much the same as looking at an image with a hex editor.

Anyone writing a new tool would need to worry about it.

Once one person writes a tool, it’s inevitable that someone else will eventually write a new tool that does the same job in a slightly different way.

(Hence the Arduboy has at least half a dozen image converters, and the Pokitto has at least two or three so far.)

How is actually seeing the name of the effect harder to read than a hex value?

That’s effectively the same as saying magic numbers are easier to read than named constants.

XML is capable of much more than INI files because it’s fully tree based, supports both sub-nodes and attributes, and most importantly it has a universal standard governing it,
whereas INI just has a handful of informal rules that differ drastically between implementation.

Why exactly?

Like I said, if you really want to treat it as just raw bytes then you can always reinterpret_cast<const unsigned char *>(tone) to get the same effect, but in doing so you make extra work for yourself because you then have to care about endianness and the resulting code for reading is most likely larger because you’re dealing with byte-by-byte copies instead of word-by-word copies.

Because sometimes you want to understand someone else’s game and they haven’t documented it very well and they haven’t picked very good variable names so reading the code is the best chance you have at identifying which song/image is which.

I’m not just plucking these things from thin air, this is stuff I’ve genuinely either done or heard of people doing.


#75

I think I’ll be sticking with my format. Although I appreciate how much work you put in, I feel that for it’s purpose, my version is better. :slightly_smiling_face:


#76

That means you’ll lose the simplicity of this:

void SoundPlayer::playSound(std::uint8_t channel, const Sound & sound, int volume, int speed)
{
	std::uint32_t soundLength = sound.length;
	std::uint32_t repeatStart = sound.repeatStart;
	std::uint32_t repeatEnd = sound.repeatEnd;

	// cheat, if there is a looping sample, make the size the length of the loop
	if(repeatEnd != 0)
	{
		soundLength = repeatEnd;
		// also cheat for start loop
		if(repeatStart == 0)
		{
			repeatStart = 1;
		}
	}

	float frequency = (POK_AUD_FREQ / static_cast<float>(sound.sampleRate));

	this->channels[channel].sound = sound;
	this->channels[channel].volume = volume;
	this->channels[channel].speed = (speed / frequency);
	this->channels[channel].soundPoint = 0;
	this->channels[channel].repeat = ((repeatStart << 8) / this->channels[channel].speed);
	this->channels[channel].enabled = true;
}

At the very least you could switch to using little endian numbers so you can just use default type copying instead of looping through sevral bytes and shifting.

Then you can at least do:

void SoundPlayer::playSound(int channel, const unsigned char * sound, int volume = 255, int speed = 255, int offset = 0)
{
	std::size_t dataOffset = 14 + offset;

	const unsigned char * data = sound;
	// get sound length
	uint32_t soundSize = *(reinterpret_cast<const uint32_t *>(data));
	data += sizeof(uint32_t);
	
	uint16_t sampleRate = *(reinterpret_cast<const uint16_t *>(data));
	data += sizeof(uint16_t);
	
	uint32_t repeatStart = *(reinterpret_cast<const uint32_t *>(data));
	data += sizeof(uint32_t);
	
	uint32_t repeatEnd = *(reinterpret_cast<const uint32_t *>(data));
	data += sizeof(uint32_t);

	// cheat, if there is a looping sample, make the size the length of the loop
	if(repeatEnd != 0)
	{
		soundSize = repeatEnd;
		// also cheat for start loop
		if(repeatStart == 0)
		{
		   repeatStart = 1;
		}
	}

	float frequency = (POK_AUD_FREQ / static_cast<float>(sound.sampleRate));

	this->channels[channel].sound = sound;
	this->channels[channel].volume = volume;
	this->channels[channel].speed = (speed / frequency);
	this->channels[channel].soundPoint = 0;
	this->channels[channel].repeat = ((repeatStart << 8) / this->channels[channel].speed);
	this->channels[channel].enabled = true;
}

Instead of

void SoundPlayer::playSound(int channel, const unsigned char * sound, int volume = 255, int speed = 255, int offset = 0)
{
	std::size_t dataOffset = 14 + offset;

	// get sound length
	uint32_t soundSize = 0;
	for(std::size_t t = 0; t < sizeof(uint32_t); ++t)
	{
		soundSize <<= 8;
		soundSize |= sound[t];
	}

	uint16_t sampleRate = 0;
	for(std::size_t t = 0; t < sizeof(uint16_t); ++t)
	{
		sampleRate <<= 8;
		sampleRate |= sound[t + sizeof(uint32_t)];
	}

	// get repeat start
	uint32_t repeatStart = 0;
	for(std::size_t t = 0; t < sizeof(uint32_t); ++t)
	{
		repeatStart <<= 8;
		repeatStart |= sound[t + sizeof(uint32_t) + sizeof(uint36_t)];
	}

	// get repeat end
	uint32_t repeatEnd = 0;
	for(std::size_t t = 0; t < sizeof(uint32_t); ++t)
	{
		repeatEnd <<= 8;
		repeatEnd |= sound[t + sizeof(uint32_t) + sizeof(uint36_t) + sizeof(uint32_t)];
	}

	// cheat, if there is a looping sample, make the size the length of the loop
	if(repeatEnd != 0)
	{
		soundSize = repeatEnd;
		// also cheat for start loop
		if(repeatStart == 0)
		{
		   repeatStart = 1;
		}
	}

	float frequency = (POK_AUD_FREQ / static_cast<float>(sound.sampleRate));

	this->channels[channel].sound = sound;
	this->channels[channel].volume = volume;
	this->channels[channel].speed = (speed / frequency);
	this->channels[channel].soundPoint = 0;
	this->channels[channel].repeat = ((repeatStart << 8) / this->channels[channel].speed);
	this->channels[channel].enabled = true;
}